first commit

2025-06-02 10:46:00 +08:00 · 2021-11-01 23:12:52 +01:00
commit 4bd027b0d7
40 changed files with 6016 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,4 @@
 .dist
 venv/*
 App/*
 .notes.txt
--- a/1000lines_35m_3mpng.png
+++ b/1000lines_35m_3mpng.png
--- a/In/df_lines.csv
+++ b/In/df_lines.csv
--- a/21
+++ b/21
@ -0,0 +1,21 @@
 MIT License
 Copyright (c) 2021 Samir Saci
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/README.md
+++ b/README.md
@ -0,0 +1,174 @@
 # Automate ABC Analysis & Product Segmentation with Streamlit 📈
 *A statistical methodology to segment your products based on turnover and demand variability using an automated solution with a web application designed with the framework Streamlit*
 <p align="center">
  <img align="center" src="images/streamlit_capture.PNG" width=75%>
 </p>
 Product segmentation refers to the activity of grouping products that have similar characteristics and serve a similar market. It is usually related to marketing _(Sales Categories)_ or manufacturing _(Production Processes)_. However as a **Supply Chaine Engineer** your focus is not on the product itself but more on the complexity of managing its flow.
 Your want to understand the sales volumes distribution (fast/slow movers) and demand variability to optimize your production, storage and delivery operations to ensure the best service level by considering: 
 - The highest contribution to your total volume: ABC Analysis
 - The most unstable demand: Demand Variability
 I have designed this **Streamlit App** to provide a tool to **Supply Chain Engineers** for Product Segmentation, with a focus on retail products, of their portofolio considering the complexity of the demand and the volumes contribution of each item.
 ### Understand the theory behind 📜
 In this [Medium Article](https://towardsdatascience.com/product-segmentation-for-retail-with-python-c85cc0930f9a), you can find details about the theory used to build this tool. 
 # Access the application 🖥️ 
 > Access it here: [Product Segmentation for Retail](https://share.streamlit.io/samirsaci/segmentation/main/segmentation.py)
 ## **Step 0: Why should you use it?**
 This Streamlit Web Application has been designed for Supply Chain Engineers to support them in their Inventory Management. It will help you to automate product segmentation using statistics.
 ## **Step 1: What do you want to do?**
 You have two ways to use this application:
 - 🖥️ Look at the results computed by the model using the pre-loaded dataset: in that case you just need to scroll to see the visuals and the analyses
 OR
 - 💾 Upload your dataset of sales records that includes columns related to:
  - **Item master data**
  _For example: SKU ID, Category, Sub-Category, Store ID_
  - **Date of the sales**:
  _For example: Day, Week, Month, Year_
  - **Quantity or value**: this measure will be used for the ABC analysis
  _For example: units, cartons, pallets or euros/dollars/your local currency_
 ## **Step 2: Prepare the analysis**
 ### **1. 💾 Upload your dataset of sales records**
 <p align="center">
  <img align="center" src="images/step_1.PNG" width=40%>
 </p>
 💡 _Please make sure that you dataset format is csv with a file size lower than 200MB. If you want to increase the size, you'd better copy this repository and deploy the app locally following the instructions below._
 ### **2. 📅 [Parameters] select the columns for the date (day, week, year) and the values (quantity, $)**
 <p align="center">
  <img align="center" src="images/step_2.PNG" width=75%>
 </p>
 💡 _If you have several columns for the date (day, week, month) and for the values (quantity, amount) you can use only one column per category for each run of calculation._
 ### **3. 📉 [Parameters] select all the columns you want to keep in the analysis**
 <p align="center">
  <img align="center" src="images/step_3.PNG" width=75%>
 </p>
 💡 _This step will basically help you to remove the columns that you do not need for your analysis to increase the speed of computation and reduce the usage of ressources._
 ### **4. 🏬 [Parameters] select all the related to product master data (SKU ID, FAMILIY, CATEGORY, STORE LOCATION)**
 <p align="center">
  <img align="center" src="images/step_4.PNG" width=75%>
 </p>
 💡 _In this step you will show at what granularity you want to do your analysis. For example it can be at:_
  - _Item, Store level: that means the same item in two stores will represent two SKU_
  - _Item ID level: that means you group the sales of your item in all stores_
 ### **5. 🛍️ [Parameters] select one feature you want to use for analysis by family**
 <p align="center">
  <img align="center" src="images/step_5.PNG" width=75%>
 </p>
 💡 _This feature will be used to plot the repartition of (A, B, C) product by family_
 ### **6. 🖱️ Click on Start Calculation? to launch the analysis**
 <p align="center">
  <img align="center" src="images/step_6.PNG" width=75%>
 </p>
 💡 _This feature will be used to plot the repartition of (A, B, C) product by family_
 # Get insights about your sales records 💡
 ### **Pareto Analysis**
 <p align="center">
  <img align="center" src="images/pareto.PNG" width=75%>
 </p>
 **INSIGHTS:** 
 1. How many SKU represent 80% of your total sales?
 2. How much sales represent 20% of your SKUs?
 _For more information about the theory behind the pareto law and its application in Supply Chain Management: [Pareto Principle for Warehouse Layout Optimization](https://towardsdatascience.com/reduce-warehouse-space-with-the-pareto-principle-using-python-e722a6babe0e)_
 ### **ABC Analysis with Demand Variability**
 <p align="center">
  <img align="center" src="images/abc_analysis.PNG" width=75%>
 </p>
 **QUESTIONS: WHAT IS THE PROPORTION OF?** 
 1. **LOW IMPORTANCE SKUS**: C references
 2. **STABLE DEMAND SKUS**: A and B SKUs with a coefficient of variation below 1 
 3. **HIGH IMPORTANCE SKUS**: A and B SKUS with a high coefficient of variation
 Your inventory management strategies will be impacted by this split:
 - A minimum effort should be put in **LOW IMPORTANCE SKUS**
 - Automated rules with a moderate attention for **STABLE SKUS**
 - Complex replenishment rules and careful attention for **HIGH IMPORTANCE SKUS**
 _For more information: [Medium Article](https://towardsdatascience.com/product-segmentation-for-retail-with-python-c85cc0930f9a)_
 <p align="center">
  <img align="center" src="images/split_category.PNG" width=75%>
 </p>
 **QUESTIONS:** 
 1. What is the split of SKUS by FAMILY?
 2. What is the split of SKUS by ABC class in each FAMILY?
 ### **Normality Test**
 <p align="center">
  <img align="center" src="images/normality.PNG" width=75%>
 </p>
 **QUESTION:** 
 - Which SKUs have a sales distribution that follows a normal distribution?
 Many inventory rules and safety stock formula can be used only if the sales distribution of your item is following a normal distribution. Thefore, it's better to know the % of your portofolio that can be managed easily.
 _For more information: [Inventory Management for Retail — Stochastic Demand](https://towardsdatascience.com/inventory-management-for-retail-stochastic-demand-3020a43d1c14)_
 # Build the application locally 🏗️ 
 ## **Build a python local environment (recommanded)** 
 ### Then install **virtualenv** using pip3
    sudo pip3 install virtualenv 
 ### Now create a virtual environment 
    virtualenv venv 
 ### Active your virtual environment    
    source venv/bin/activate
 ## Launch Streamlit 🚀
 ### Install all dependencies needed using requirements.txt
     pip install -r requirements.txt 
 ### Run the application  
    streamlit run segmentation.py 
 ### Click on the Network URL in the shell   
  <p align="center">
    <img align="center" src="images/network.PNG" width=50%>
  </p>
 > -> Enjoy!
 # About me 🤓
 Senior Supply Chain Engineer with an international experience working on Logistics and Transportation operations. \
 Have a look at my portfolio: [Data Science for Supply Chain Portfolio](https://samirsaci.com) \
 Data Science for Warehousing📦, Transportation 🚚 and Demand Forecasting 📈 
--- a/app.py
+++ b/app.py
@ -0,0 +1,124 @@
 import pandas as pd
 import numpy as np
 import plotly.express as px
 from utils.routing.distances import (
 	distance_picking,
 	next_location
 )
 from utils.routing.routes import (
 	create_picking_route
 )
 from utils.batch.mapping_batch import (
 	orderlines_mapping,
 	locations_listing
 )
 from utils.cluster.mapping_cluster import (
 	df_mapping
 )
 from utils.batch.simulation_batch import (
 	simulation_wave,
 	simulate_batch
 )
 from utils.cluster.simulation_cluster import(
 	loop_wave,
 	simulation_cluster,
 	create_dataframe,
 	process_methods
 )
 from utils.results.plot import (
 	plot_simulation1,
 	plot_simulation2
 )
 import streamlit as st
 from streamlit import caching
 # Set page configuration
 st.set_page_config(page_title ="Improve Warehouse Productivity using Order Batching",
                    initial_sidebar_state="expanded",
                    layout='wide',
                    page_icon="🛒")
 # Set up the page
@st.cache(persist=False,
          allow_output_mutation=True,
          suppress_st_warning=True,
          show_spinner= True)
 # Preparation of data
 def load(filename, n):
    df_orderlines = pd.read_csv(IN + filename).head(n)
    return df_orderlines
 # Alley Coordinates on y-axis
 y_low, y_high = 5.5, 50
 # Origin Location
 origin_loc = [0, y_low]
 # Distance Threshold (m)			
 distance_threshold = 35			
 distance_list = [1] + [i for i in range(5, 100, 5)]		
 IN = 'In/'
 # Store Results by WaveID
 list_wid, list_dst, list_route, list_ord, list_lines, list_pcs, list_monomult = [], [], [], [], [], [], []
 list_results = [list_wid, list_dst, list_route, list_ord, list_lines, list_pcs, list_monomult]	# Group in list
 # Store Results by Simulation (Order_number)
 list_ordnum , list_dstw = [], []
 # Simulation 1: Order Batch
 # SCOPE SIZE
 st.header("**🥇 Impact of the wave size in orders (Orders/Wave) **")
 st.subheader('''
        🛠️ HOW MANY ORDER LINES DO YOU WANT TO INCLUDE IN YOUR ANALYSIS?
    ''')
 col1, col2 = st.beta_columns(2)
 with col1:
 	n = st.slider(
 				'SIMULATION 1 SCOPE (THOUSDAND ORDERS)', 1, 200 , value = 5)
 with col2:
 	lines_number = 1000 * n 
 	st.write('''🛠️{:,} \
 		order lines'''.format(lines_number))
 # SIMULATION PARAMETERS
 st.subheader('''
        🛠️ SIMULATE ORDER PICKING BY WAVE OF N ORDERS PER WAVE WITH N IN [N_MIN, N_MAX] ''')
 col_11 , col_22 = st.beta_columns(2)
 with col_11:
 	n1 = st.slider(
 				'SIMULATION 1: N_MIN (ORDERS/WAVE)', 0, 20 , value = 1)
 	n2 = st.slider(
 				'SIMULATION 1: N_MAX (ORDERS/WAVE)', n1 + 1, 20 , value = int(np.max([n1+1 , 10])))
 with col_22:
 		st.write('''[N_MIN, N_MAX] = [{:,}, {:,}]'''.format(n1, n2))
 # START CALCULATION
 start_1= False
 if st.checkbox('SIMULATION 1: START CALCULATION',key='show', value=False):
    start_1 = True
 # Calculation
 if start_1:
 	df_orderlines = load('df_lines.csv', lines_number)
 	df_waves, df_results = simulate_batch(n1, n2, y_low, y_high, origin_loc, lines_number, df_orderlines)
 	plot_simulation1(df_results, lines_number)
 # Simulation 2: Order Batch using Spatial Clustering 
 # SCOPE SIZE
 st.header("**🥈 Impact of the wave size in orders (Orders/Wave) **")
 st.subheader('''
        🛠️ HOW MANY ORDER LINES DO YOU WANT TO INCLUDE IN YOUR ANALYSIS?
    ''')
 col1, col2 = st.beta_columns(2)
 with col1:
 	n_ = st.slider(
 				'SIMULATION 2 SCOPE (THOUSDAND ORDERS)', 1, 200 , value = 5)
 with col2:
 	lines_2 = 1000 * n_ 
 	st.write('''🛠️{:,} \
 		order lines'''.format(lines_number))
 # START CALCULATION
 start_2 = False
 if st.checkbox('SIMULATION 2: START CALCULATION',key='show_2', value=False):
    start_2 = True
 # Calculation
 if start_2:
 	df_orderlines = load('df_lines.csv', lines_2)
 	df_reswave, df_results = simulation_cluster(y_low, y_high, df_orderlines, list_results, n1, n2, 
 			distance_threshold)
 	plot_simulation2(df_reswave, lines_2, distance_threshold)
--- a/notes.txt
+++ b/notes.txt
@ -0,0 +1,41 @@
 # Example Artefact
 https://github.com/MaximeLutel/streamlit_prophet
 https://streamlit.io/gallery?category=model-building-training
 - The Math of the Prophet
 https://medium.com/future-vision/the-math-of-prophet-46864fa9c55a
 # INSTALL NODE
 https://docs.microsoft.com/fr-fr/windows/dev-environment/javascript/nodejs-on-wsl
 # Ubuntu WSL VS Code
 https://code.visualstudio.com/docs/remote/wsl
 - Donner les droits admins pour ecrire et dl des librairies
 sudo chown -R samirs streamlit_prophet
 # Move local directory of Windows to Local Linux
 mkdir app
 cp -R /mnt/c/Data/62-\ Projects/24-\ Articles/25-\ Improve\ Warehouse\ Productivity/App ~/app
 cd ~/App
 code .
 # Github
 git config --global user.email "samir.saci@outlook.com"
 git config --global user.name "Samir Saci"
 git remote add origin 'https://github.com/samirsaci/segmentation.git'
 git push -u origin main
 # Install pipenv
 pip install virtualenv
 python3.8 -m virtualenv venv
 source venv/bin/activate
 # Activate Streamlit
 streamlit run segmentation.py --server.address 0.0.0.0
 streamlit run app.py --server.address 0.0.0.0
 # SEGMENTATION TO DO
 	1) FAMILY = F(SKU SCOPE)
 	2) ITEM = ITEM LIST - FAMILY
 C:\Data\62- Projects\24- Articles\25- Improve Warehouse Productivity\App
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,93 @@
 absl-py==0.15.0
 altair==4.1.0
 argon2-cffi==21.1.0
 astor==0.8.1
 attrs==21.2.0
 backcall==0.2.0
 backports.zoneinfo==0.2.1
 base58==2.1.1
 bleach==4.1.0
 blinker==1.4
 cachetools==4.2.4
 certifi==2021.10.8
 cffi==1.15.0
 charset-normalizer==2.0.7
 click==8.0.3
 cycler==0.11.0
 debugpy==1.5.1
 decorator==5.1.0
 defusedxml==0.7.1
 entrypoints==0.3
 et-xmlfile==1.1.0
 gitdb==4.0.9
 GitPython==3.1.24
 idna==3.3
 ipykernel==6.4.2
 ipython==7.29.0
 ipython-genutils==0.2.0
 ipywidgets==7.6.5
 jedi==0.18.0
 Jinja2==3.0.2
 jsonschema==4.1.2
 jupyter-client==7.0.6
 jupyter-core==4.9.1
 jupyterlab-pygments==0.1.2
 jupyterlab-widgets==1.0.2
 kiwisolver==1.3.2
 MarkupSafe==2.0.1
 matplotlib==3.4.3
 matplotlib-inline==0.1.3
 mistune==0.8.4
 nbclient==0.5.4
 nbconvert==6.2.0
 nbformat==5.1.3
 nest-asyncio==1.5.1
 notebook==6.4.5
 numpy==1.21.3
 openpyxl==3.0.9
 ortools==9.1.9490
 packaging==21.2
 pandas==1.3.4
 pandocfilters==1.5.0
 parso==0.8.2
 pexpect==4.8.0
 pickleshare==0.7.5
 Pillow==8.4.0
 plotly==5.3.1
 prometheus-client==0.12.0
 prompt-toolkit==3.0.21
 protobuf==3.19.1
 ptyprocess==0.7.0
 pyarrow==6.0.0
 pycparser==2.20
 pydeck==0.7.1
 Pygments==2.10.0
 pyparsing==2.4.7
 pyrsistent==0.18.0
 python-dateutil==2.8.2
 pytz==2021.3
 pytz-deprecation-shim==0.1.0.post0
 PyYAML==6.0
 pyzmq==22.3.0
 requests==2.26.0
 scipy==1.7.1
 Send2Trash==1.8.0
 six==1.16.0
 smmap==5.0.0
 streamlit==0.77.0
 tenacity==8.0.1
 terminado==0.12.1
 testpath==0.5.0
 toml==0.10.2
 toolz==0.11.1
 tornado==6.1
 traitlets==5.1.1
 typing-extensions==3.10.0.2
 tzdata==2021.5
 tzlocal==4.1
 urllib3==1.26.7
 validators==0.18.2
 watchdog==2.1.6
 wcwidth==0.2.5
 webencodings==0.5.1
 widgetsnbextension==3.5.2
--- a/static/img/1000lines_batch.png
+++ b/static/img/1000lines_batch.png
--- a/static/img/1000lines_distance35m_methods.png
+++ b/static/img/1000lines_distance35m_methods.png
--- a/utils/pycache/clustering.cpython-38.pyc
+++ b/utils/pycache/clustering.cpython-38.pyc
--- a/utils/pycache/distances.cpython-38.pyc
+++ b/utils/pycache/distances.cpython-38.pyc
--- a/utils/pycache/mapping_batch.cpython-38.pyc
+++ b/utils/pycache/mapping_batch.cpython-38.pyc
--- a/utils/pycache/mapping_cluster.cpython-38.pyc
+++ b/utils/pycache/mapping_cluster.cpython-38.pyc
--- a/utils/pycache/params.cpython-38.pyc
+++ b/utils/pycache/params.cpython-38.pyc
--- a/utils/pycache/plot.cpython-38.pyc
+++ b/utils/pycache/plot.cpython-38.pyc
--- a/utils/pycache/processing.cpython-38.pyc
+++ b/utils/pycache/processing.cpython-38.pyc
--- a/utils/pycache/routes.cpython-38.pyc
+++ b/utils/pycache/routes.cpython-38.pyc
--- a/utils/pycache/simulation.cpython-38.pyc
+++ b/utils/pycache/simulation.cpython-38.pyc
--- a/utils/pycache/simulation_batch.cpython-38.pyc
+++ b/utils/pycache/simulation_batch.cpython-38.pyc
--- a/utils/pycache/simulation_cluster.cpython-38.pyc
+++ b/utils/pycache/simulation_cluster.cpython-38.pyc
--- a/utils/pycache/waves.cpython-38.pyc
+++ b/utils/pycache/waves.cpython-38.pyc
--- a/utils/batch/pycache/mapping_batch.cpython-38.pyc
+++ b/utils/batch/pycache/mapping_batch.cpython-38.pyc
--- a/utils/batch/pycache/simulation_batch.cpython-38.pyc
+++ b/utils/batch/pycache/simulation_batch.cpython-38.pyc
--- a/utils/batch/mapping_batch.py
+++ b/utils/batch/mapping_batch.py
@ -0,0 +1,30 @@
 import numpy as np
 import pandas as pd
 import itertools
 from ast import literal_eval
 def orderlines_mapping(df_orderlines, orders_number):
 	'''Mapping orders with wave number'''
 	df_orderlines.sort_values(by='DATE', ascending = True, inplace = True)
 	# Unique order numbers list
 	list_orders = df_orderlines.OrderNumber.unique()
 	dict_map = dict(zip(list_orders, [i for i in range(1, len(list_orders))]))
 	# Order ID mapping
 	df_orderlines['OrderID'] = df_orderlines['OrderNumber'].map(dict_map)
 	# Grouping Orders by Wave of orders_number 
 	df_orderlines['WaveID'] = (df_orderlines.OrderID%orders_number == 0).shift(1).fillna(0).cumsum()
 	# Counting number of Waves
 	waves_number = df_orderlines.WaveID.max() + 1
 	return df_orderlines, waves_number
 def locations_listing(df_orderlines, wave_id):
 	'''Getting storage locations to cover for a wave of orders'''
 	df = df_orderlines[df_orderlines.WaveID == wave_id]
 	# Create coordinates listing
 	list_locs = list(df['Coord'].apply(lambda t: literal_eval(t)).values)
 	list_locs.sort()
 	# List of unique coordinates
 	list_locs = list(k for k,_ in itertools.groupby(list_locs))
 	n_locs = len(list_locs)
 	return list_locs, n_locs
--- a/utils/batch/simulation_batch.py
+++ b/utils/batch/simulation_batch.py
@ -0,0 +1,41 @@
 from utils.batch.mapping_batch import *
 from utils.cluster.mapping_cluster import *
 from utils.routing.routes import *
 def simulation_wave(y_low, y_high, origin_loc, orders_number, df_orderlines, list_wid, list_dst, list_route, list_ord):
 	''' Simulate of total picking distance with n orders per wave'''
 	distance_route = 0 
 	# Create waves
 	df_orderlines, waves_number = orderlines_mapping(df_orderlines, orders_number)
 	for wave_id in range(waves_number):
 		# Listing of all locations for this wave 
 		list_locs, n_locs, n_lines, n_pcs = locations_listing(df_orderlines, wave_id)
 		# Results
 		wave_distance, list_chemin = create_picking_route(origin_loc, list_locs, y_low, y_high)
 		distance_route = distance_route + wave_distance
 		list_wid.append(wave_id)
 		list_dst.append(wave_distance)
 		list_route.append(list_chemin)
 		list_ord.append(orders_number)
 	return list_wid, list_dst, list_route, list_ord, distance_route
 def simulate_batch(n1, n2, y_low, y_high, origin_loc, orders_number, df_orderlines):
 	''' Loop with several scenarios of n orders per wave'''
 	# Lists for results
 	list_wid, list_dst, list_route, list_ord = [], [], [], []
 	# Test several values of orders per wave
 	for orders_number in range(n1, n2 + 1):
 		list_wid, list_dst, list_route, list_ord, distance_route = simulation_wave(y_low, y_high, origin_loc, orders_number, 
 		df_orderlines, list_wid, list_dst, list_route, list_ord)
 		print("Total distance covered for {} orders/wave: {:,} m".format(orders_number, distance_route))
 	# By Wave
 	df_waves = pd.DataFrame({'wave': list_wid,
 				'distance': list_dst,
 				'routes': list_route,
 				'order_per_wave': list_ord})
 	# Results aggregate
 	df_results = pd.DataFrame(df_waves.groupby(['order_per_wave'])['distance'].sum())
 	df_results.columns = ['distance']
 	return df_waves, df_results.reset_index()
--- a/utils/cluster/pycache/clustering.cpython-38.pyc
+++ b/utils/cluster/pycache/clustering.cpython-38.pyc
--- a/utils/cluster/pycache/mapping_cluster.cpython-38.pyc
+++ b/utils/cluster/pycache/mapping_cluster.cpython-38.pyc
--- a/utils/cluster/pycache/simulation_cluster.cpython-38.pyc
+++ b/utils/cluster/pycache/simulation_cluster.cpython-38.pyc
--- a/utils/cluster/clustering.py
+++ b/utils/cluster/clustering.py
@ -0,0 +1,98 @@
 import numpy as np
 import pandas as pd
 import itertools
 from ast import literal_eval
 import matplotlib.pyplot as plt
 from scipy.cluster.vq import kmeans2, whiten
 from scipy.spatial.distance import pdist
 from scipy.cluster.hierarchy import ward, fcluster
 from utils.routing.distances import *
 def cluster_locations(list_coord, distance_threshold, dist_method, clust_start):
    ''' Step 1: Create clusters of locations'''
    # Create linkage matrix
    if dist_method == 'euclidian':
        Z = ward(pdist(np.stack(list_coord)))
    else:
        Z = ward(pdist(np.stack(list_coord), metric = distance_picking_cluster))
    # Single cluster array
    fclust1 = fcluster(Z, t = distance_threshold, criterion = 'distance')
    return fclust1
 def clustering_mapping(df, distance_threshold, dist_method, orders_number, wave_start, clust_start, df_type): # clustering_loc
    '''Step 2: Clustering and mapping'''
    # 1. Create Clusters
    list_coord, list_OrderNumber, clust_id, df = cluster_wave(df, distance_threshold, 'custom', clust_start, df_type)
    clust_idmax = max(clust_id) # Last Cluster ID
    # 2. Mapping Order lines
    dict_map, dict_omap, df, Wave_max = lines_mapping_clst(df, list_coord, list_OrderNumber, clust_id, orders_number, wave_start)
    return dict_map, dict_omap, df, Wave_max, clust_idmax
 def cluster_wave(df, distance_threshold, dist_method, clust_start, df_type):
    '''Step 3: Create waves by clusters'''
    # Create Column for Clustering
    if df_type == 'df_mono':
        df['Coord_Cluster'] = df['Coord'] 
    # Mapping points
    df_map = pd.DataFrame(df.groupby(['OrderNumber', 'Coord_Cluster'])['SKU'].count()).reset_index() 	# Here we use Coord Cluster
    list_coord, list_OrderNumber = np.stack(df_map.Coord_Cluster.apply(lambda t: literal_eval(t)).values), df_map.OrderNumber.values
    # Cluster picking locations
    clust_id = cluster_locations(list_coord, distance_threshold, dist_method, clust_start)
    clust_id = [(i + clust_start) for i in clust_id]
    # List_coord
    list_coord = np.stack(list_coord)
    return list_coord, list_OrderNumber, clust_id, df
 def lines_mapping(df, orders_number, wave_start):
    '''Step 4: Mapping Order lines mapping without clustering '''
    # Unique order numbers list
    list_orders = df.OrderNumber.unique()
    # Dictionnary for mapping
    dict_map = dict(zip(list_orders, [i for i in range(1, len(list_orders))]))
    # Order ID mapping
    df['OrderID'] = df['OrderNumber'].map(dict_map)
    # Grouping Orders by Wave of orders_number 
    df['WaveID'] = (df.OrderID%orders_number == 0).shift(1).fillna(0).cumsum() + wave_start
    # Counting number of Waves
    waves_number = df.WaveID.max() + 1
    return df, waves_number
 def lines_mapping_clst(df, list_coord, list_OrderNumber, clust_id, orders_number, wave_start):
    '''Step 4: Mapping Order lines mapping with clustering '''
    # Dictionnary for mapping by cluster
    dict_map = dict(zip(list_OrderNumber, clust_id))
    # Dataframe mapping
    df['ClusterID'] = df['OrderNumber'].map(dict_map)
    # Order by ID and mapping
    df = df.sort_values(['ClusterID','OrderNumber'], ascending = True)
    list_orders = list(df.OrderNumber.unique())
    # Dictionnary for order mapping 
    dict_omap = dict(zip(list_orders, [i for i in range(1, len(list_orders))]))
    # Order ID mapping
    df['OrderID'] = df['OrderNumber'].map(dict_omap)
    # Create Waves: Increment when reaching orders_number or changing cluster
    df['WaveID'] = wave_start + ((df.OrderID%orders_number == 0) | (df.ClusterID.diff() != 0)).shift(1).fillna(0).cumsum() 
    wave_max = df.WaveID.max()
    return dict_map, dict_omap, df, wave_max
 def locations_listing(df_orderlines, wave_id):
    ''' Step 5: Listing location per Wave of orders'''
    # Filter by wave_id
    df = df_orderlines[df_orderlines.WaveID == wave_id]
    # Create coordinates listing
    list_coord = list(df['Coord'].apply(lambda t: literal_eval(t)).values) 	# Here we use Coord for distance
    list_coord.sort()
    # Get unique Unique coordinates
    list_coord = list(k for k,_ in itertools.groupby(list_coord))
    n_locs = len(list_coord)
    n_lines = len(df)
    n_pcs = df.PCS.sum()
    return list_coord, n_locs, n_lines, n_pcs
--- a/utils/cluster/mapping_cluster.py
+++ b/utils/cluster/mapping_cluster.py
@ -0,0 +1,37 @@
 from utils.cluster.clustering import *
 from utils.process.processing import *
 from utils.routing.distances import *
 def df_mapping(df_orderlines, orders_number, distance_threshold, mono_method, multi_method):
    ''' Mapping Order lines Dataframe using clustering'''
    # Filter mono and multi orders
    df_mono, df_multi = process_lines(df_orderlines)
    wave_start = 0
    clust_start = 0
    # Mapping for single line orders
    if mono_method == 'clustering':		
        df_type = 'df_mono' 	
        dict_map, dict_omap, df_mono, waves_number, clust_idmax = clustering_mapping(df_mono, distance_threshold, 'custom', 
            orders_number, wave_start, clust_start, df_type)
    else: 
        df_mono, waves_number = lines_mapping(df_mono, orders_number, 0)
        clust_idmax = 0 
        # => Wave_start
    wave_start = waves_number
    clust_start = clust_idmax 
    # Mapping for multi line orders
    if multi_method == 'clustering':
        df_type = 'df_multi' 	
        df_multi = centroid_mapping(df_multi)
        dict_map, dict_omap, df_multi, waves_number, clust_idmax  = clustering_mapping(df_multi, distance_threshold, 'custom', 
            orders_number, wave_start, clust_start, df_type)
    else:
        df_multi, waves_number = lines_mapping(df_multi, orders_number, wave_start)
    # Final Concatenation
    df_orderlines, waves_number = monomult_concat(df_mono, df_multi)
    return df_orderlines, waves_number
--- a/utils/cluster/simulation_cluster.py
+++ b/utils/cluster/simulation_cluster.py
@ -0,0 +1,150 @@
 from utils.cluster.mapping_cluster import *
 from utils.routing.routes import *
 # Function 
 def simulation_wave(y_low, y_high, orders_number, df_orderlines, list_results, distance_threshold, mono_method, multi_method):
    ''' Simulate the distance for a number of orders per wave'''
    # List to store values
    [list_wid, list_dst, list_route, list_ord, list_lines, list_pcs, list_monomult] = [list_results[i] for i in range(len(list_results))]
    # Variables to store total distance
    distance_route = 0
    origin_loc = [0, y_low] 	
    # Mapping of orderlines with waves number
    df_orderlines, waves_number = df_mapping(df_orderlines, orders_number, distance_threshold, mono_method, multi_method)
    # Loop
    for wave_id in range(waves_number):
        # Listing of all locations for this wave 
        list_locs, n_locs, n_lines, n_pcs = locations_listing(df_orderlines, wave_id)
        # Create picking route
        wave_distance, list_chemin, distance_max = create_picking_route_cluster(origin_loc, list_locs, y_low, y_high)
        # Total walking distance
        distance_route = distance_route + wave_distance
        # Results by wave
        monomult = mono_method + '-' + multi_method
        # Add the results 
        list_wid, list_dst, list_route, list_ord, list_lines, list_pcs, list_monomult = append_results(list_wid, list_dst, list_route, list_ord, list_lines, 
        list_pcs, list_monomult, wave_id, wave_distance, list_chemin, orders_number, n_lines, n_pcs, monomult)
    # List results
    list_results = [list_wid, list_dst, list_route, list_ord, list_lines, list_pcs, list_monomult]
    return list_results, distance_route
 def loop_wave(y_low, y_high, df_orderlines, list_results, n1, n2, distance_threshold, mono_method, multi_method):
    ''' Simulate all scenarios for each number of orders per wave'''
    # Lists for records
    list_ordnum, list_dstw = [], []
    lines_number = len(df_orderlines)
    # Test several values of orders per wave
    for orders_number in range(n1, n2):
        # Scenario of orders/wave = orders_number 
        list_results, distance_route = simulation_wave(y_low, y_high, orders_number, df_orderlines, list_results,
            distance_threshold, mono_method, multi_method)
        # Append results per Wave
        list_ordnum.append(orders_number)
        list_dstw.append(distance_route)
        print("{} orders/wave: {:,} m".format(orders_number, distance_route))
    # Output list
    [list_wid, list_dst, list_route, list_ord, list_lines, list_pcs, list_monomult] = [list_results[i] for i in range(len(list_results))]
    # Output results per wave
    df_results, df_reswave = create_dataframe(list_wid, list_dst, list_route, list_ord, 
        distance_route, list_lines, list_pcs, list_monomult, list_ordnum, list_dstw)
    return list_results, df_reswave
 def simulation_cluster(y_low, y_high, df_orderlines, list_results, n1, n2, distance_threshold):
    '''Simulate for three scenarios'''
    # Loop_wave: Simulation 1
    mono_method, multi_method = 'normal', 'normal'
    list_results, df_reswave1 = loop_wave(y_low, y_high, df_orderlines, list_results, n1, n2, 
        distance_threshold, mono_method, multi_method)
    # Loop_wave: Simulation 2
    mono_method, multi_method = 'clustering', 'normal'
    list_results, df_reswave2 = loop_wave(y_low, y_high, df_orderlines, list_results, n1, n2, 
        distance_threshold, mono_method, multi_method)
    # Loop_wave: Simulation 3
    mono_method, multi_method = 'clustering', 'clustering'
    list_results, df_reswave3 = loop_wave(y_low, y_high, df_orderlines, list_results, n1, n2, 
        distance_threshold, mono_method, multi_method)
    # Expand
    [list_wid, list_dst, list_route, list_ord, list_lines, list_pcs, list_monomult] = [list_results[i] for i in range(len(list_results))]
    lines_number = len(df_orderlines)
    # Results 
    df_results = pd.DataFrame({'wave_number': list_wid,
                                'distance': list_dst,
                                'chemins': list_route,
                                'order_per_wave': list_ord,
                                'lines': list_lines,
                                'pcs': list_pcs,
                                'mono_multi':list_monomult})
    # Final Processing
    df_reswave = process_methods(df_reswave1, df_reswave2, df_reswave3, lines_number, distance_threshold)
    return df_reswave, df_results
 def create_dataframe(list_wid, list_dst, list_route, list_ord, distance_route, list_lines, list_pcs, list_monomult, list_ordnum, list_dstw):
    ''' Create Dataframes of results'''
    # Results by Wave df
    df_results = pd.DataFrame({'wave_number': list_wid,
                                'distance': list_dst,
                                'chemin': list_route,
                                'orders_per_wave': list_ord,
                                'lines': list_lines,
                                'pcs': list_pcs,
                                'mono_multi':list_monomult})
    # Results by Wave_ID
    df_reswave = pd.DataFrame({
        'orders_number': list_ordnum,
        'distance': list_dstw 
        })
    return df_results, df_reswave
 # Append Results
 def append_results(list_wid, list_dst, list_route, list_ord, list_lines, 
 		list_pcs, list_monomult, wave_id, wave_distance, list_chemin, orders_number, n_lines, n_pcs, monomult):
 	list_wid.append(wave_id)
 	list_dst.append(wave_distance)
 	list_route.append(list_chemin)
 	list_ord.append(orders_number)
 	list_lines.append(n_lines)
 	list_pcs.append(n_pcs)
 	list_monomult.append(monomult)
 	return list_wid, list_dst, list_route, list_ord, list_lines, list_pcs, list_monomult
 def process_methods(df_reswave1, df_reswave2, df_reswave3, lines_number, distance_threshold):
    ''' Process the results of three methods'''
    # Concatenate two dataframes for plot
    df_reswave1.rename(columns={"distance": "distance_method_1"}, inplace = True)
    df_reswave2.rename(columns={"distance": "distance_method_2"}, inplace = True)
    df_reswave3.rename(columns={"distance": "distance_method_3"}, inplace = True)
    df_reswave = df_reswave1.set_index('orders_number')
    # Rename columns
    df_reswave['distance_method_2'] = df_reswave2.set_index('orders_number')['distance_method_2']
    df_reswave['distance_method_3'] = df_reswave3.set_index('orders_number')['distance_method_3']
    df_reswave.reset_index().plot.bar(x = 'orders_number', y = ['distance_method_1', 'distance_method_2', 'distance_method_3'], 
        figsize=(10, 6), color = ['black', 'red', 'blue'])
    plt.title("Picking Route Distance for {:,} Order lines / {} m distance threshold".format(lines_number, distance_threshold))
    plt.ylabel('Walking Distance (m)')
    plt.xlabel('Orders per Wave (Orders/Wave)')
    plt.savefig("{}lines_{}m_3mpng".format(lines_number, distance_threshold))
    plt.show()
    return df_reswave
--- a/utils/process/pycache/processing.cpython-38.pyc
+++ b/utils/process/pycache/processing.cpython-38.pyc
--- a/utils/process/processing.py
+++ b/utils/process/processing.py
@ -0,0 +1,31 @@
 import pandas as pd
 def process_lines(df_orderlines):
    ''' Processing of dataframe '''
    # Mapping Order lines
    df_nline = pd.DataFrame(df_orderlines.groupby(['OrderNumber'])['SKU'].count())
    # Lists
    list_ord = list(df_nline.index.astype(int).values)
    list_lines = list(df_nline['SKU'].values.astype(int))
    # Mapping
    dict_nline = dict(zip(list_ord, list_lines))
    df_orderlines['N_lines'] = df_orderlines['OrderNumber'].map(dict_nline)
    # Processing
    df_mono, df_multi = df_orderlines[df_orderlines['N_lines'] == 1], df_orderlines[df_orderlines['N_lines'] > 1]
    del df_orderlines
    return df_mono, df_multi
 def monomult_concat(df_mono, df_multi):
    ''' Concat mono-line and multi-lines orders'''
    # Original Coordinate for mono 
    df_mono['Coord_Cluster'] = df_mono['Coord']
    # Dataframe Concatenation
    df_orderlines = pd.concat([df_mono, df_multi])
    # Counting number of Waves
    waves_number = df_orderlines.WaveID.max() + 1
    return df_orderlines, waves_number
--- a/utils/results/pycache/plot.cpython-38.pyc
+++ b/utils/results/pycache/plot.cpython-38.pyc
--- a/utils/results/plot.py
+++ b/utils/results/plot.py
@ -0,0 +1,31 @@
 import matplotlib.pyplot as plt
 import plotly.express as px
 import streamlit as st
 def plot_simulation1(df_results, lines_number):
    ''' Plot simulation of batch size'''
    fig = px.bar(data_frame=df_results,
        width=1200, 
        height=600,
        x = 'order_per_wave',
        y = 'distance',
        labels={ 
            'order_per_wave': 'Wave size (Orders/Wave)',
            'distance': 'Total Picking Walking Distance (m)'})
    fig.update_traces(marker_line_width=1,marker_line_color="black")
    st.write(fig)
 def plot_simulation2(df_reswave, lines_number, distance_threshold):
    fig = px.bar(data_frame=df_reswave.reset_index(),
        width=1200, 
        height=600,
        x = 'orders_number',
        y = ['distance_method_1', 'distance_method_2', 'distance_method_3'],
        labels={ 
            'orders_number': 'Wave size (Orders/Wave)',
            'distance_method_1': 'NO CLUSTERING APPLIED',
            'distance_method_2': 'CLUSTERING ON SINGLE LINE ORDERS',
            'distance_method_3': 'CLUSTERING ON SINGLE LINE AND CENTROID FOR MULTI LINE'}, barmode = "group")
    fig.update_traces(marker_line_width=1, marker_line_color="black")
    st.write(fig)
--- a/utils/routing/pycache/distances.cpython-38.pyc
+++ b/utils/routing/pycache/distances.cpython-38.pyc
--- a/utils/routing/pycache/routes.cpython-38.pyc
+++ b/utils/routing/pycache/routes.cpython-38.pyc
--- a/utils/routing/distances.py
+++ b/utils/routing/distances.py
@ -0,0 +1,84 @@
 import numpy as np
 import pandas as pd
 import ast 
 from ast import literal_eval
 def distance_picking(Loc1, Loc2, y_low, y_high):
    '''Calculate Picker Route Distance between two locations'''
 	# Start Point
    x1, y1 = Loc1[0], Loc1[1]
    # End Point
    x2, y2 = Loc2[0], Loc2[1]
    # Distance x-axis
    distance_x = abs(x2 - x1)
    # Distance y-axis
    if x1 == x2:
        distance_y1 = abs(y2 - y1)
        distance_y2 = distance_y1
    else:
        distance_y1 = (y_high - y1) + (y_high - y2)
        distance_y2 = (y1 - y_low) + (y2 - y_low)
    # Minimum distance on y-axis 
    distance_y = min(distance_y1, distance_y2)
    # Total distance
    distance = distance_x + distance_y
    return int(distance)
 def next_location(start_loc, list_locs, y_low, y_high):
    '''Find closest next location'''
    # Distance to every next points candidate
    list_dist = [distance_picking(start_loc, i, y_low, y_high) for i in list_locs]
    # Minimum Distance 
    distance_next = min(list_dist)
    # Location of minimum distance
    index_min = list_dist.index(min(list_dist))
    next_loc = list_locs[index_min] 
    list_locs.remove(next_loc) 
    return list_locs, start_loc, next_loc, distance_next
 def centroid(list_in):
    '''Centroid function'''
    x, y = [p[0] for p in list_in], [p[1] for p in list_in]
    centroid = [round(sum(x) / len(list_in),2), round(sum(y) / len(list_in), 2)]
    return centroid
 def centroid_mapping(df_multi):
    '''Mapping Centroids'''
    # Mapping multi
    df_multi['Coord'] = df_multi['Coord'].apply(literal_eval)
    # Group coordinates per order
    df_group = pd.DataFrame(df_multi.groupby(['OrderNumber'])['Coord'].apply(list)).reset_index()
    # Calculate Centroid
    df_group['Coord_Centroid'] = df_group['Coord'].apply(centroid)
    # Dictionnary for mapping
    list_order, list_coord = list(df_group.OrderNumber.values), list(df_group.Coord_Centroid.values)
    dict_coord = dict(zip(list_order, list_coord))
    # Final mapping
    df_multi['Coord_Cluster'] = df_multi['OrderNumber'].map(dict_coord).astype(str)
    df_multi['Coord'] = df_multi['Coord'].astype(str)
    return df_multi
 def distance_picking_cluster(point1, point2):
    y_low, y_high = 5.5, 50 
    # Start Point
    x1, y1 = point1[0], point1[1]
    # End Point
    x2, y2 = point2[0], point2[1]
    # Distance x-axis
    distance_x = abs(x2 - x1)
    # Distance y-axis
    if x1 == x2:
        distance_y1 = abs(y2 - y1)
        distance_y2 = distance_y1
    else:
        distance_y1 = (y_high - y1) + (y_high - y2)
        distance_y2 = (y1 - y_low) + (y2 - y_low)
    # Minimum distance on y-axis 
    distance_y = min(distance_y1, distance_y2)
    # Total distance
    distance = distance_x + distance_y
    return distance
--- a/utils/routing/routes.py
+++ b/utils/routing/routes.py
@ -0,0 +1,56 @@
 import pandas as pd
 import numpy as np 
 import itertools
 from ast import literal_eval
 from utils.routing.distances import *
 def create_picking_route(origin_loc, list_locs, y_low, y_high):
    '''Calculate total distance to cover for a list of locations'''
    # Total distance variable
    wave_distance = 0
    # Current location variable 
    start_loc = origin_loc
    # Store routes
    list_chemin = []
    list_chemin.append(start_loc)
    while len(list_locs) > 0: # Looping until all locations are picked
        # Going to next location
        list_locs, start_loc, next_loc, distance_next = next_location(start_loc, list_locs, y_low, y_high)
        # Update start_loc 
        start_loc = next_loc
        list_chemin.append(start_loc)
        # Update distance
        wave_distance = wave_distance + distance_next 
    # Final distance from last storage location to origin
    wave_distance = wave_distance + distance_picking(start_loc, origin_loc, y_low, y_high)
    list_chemin.append(origin_loc)
    return wave_distance, list_chemin
 # Calculate total distance to cover for a list of locations
 def create_picking_route_cluster(origin_loc, list_locs, y_low, y_high):
    # Total distance variable
    wave_distance = 0
    # Distance max
    distance_max = 0
    # Current location variable 
    start_loc = origin_loc
    # Store routes
    list_chemin = []
    list_chemin.append(start_loc)
    while len(list_locs) > 0: # Looping until all locations are picked
        # Going to next location
        list_locs, start_loc, next_loc, distance_next = next_location(start_loc, list_locs, y_low, y_high)
        # Update start_loc 
        start_loc = next_loc
        list_chemin.append(start_loc)
        if distance_next > distance_max:
            distance_max = distance_next
        # Update distance
        wave_distance = wave_distance + distance_next 
    # Final distance from last storage location to origin
    wave_distance = wave_distance + distance_picking(start_loc, origin_loc, y_low, y_high)
    list_chemin.append(origin_loc)
    return wave_distance, list_chemin, distance_max