-
Notifications
You must be signed in to change notification settings - Fork 0
i #1 Created event_processing file with 3 methods to run analysis #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… csv file Signed-off-by: Connor Narowetz <[email protected]>
Signed-off-by: Connor Narowetz <[email protected]>
- Added config_file and action argument - Run chosen action Signed-off-by: Connor Narowetz <[email protected]>
- Added Python notebook with all functionality.
- Deleted .py script to have notebook replace - Changed name of notebook
The following was added: - Example section at bottom of notebook - Petri Net model - Performance Models - BPMN filtered model - Cleaned up wording - Images folder Signed-off-by: Connor Narowetz <[email protected]>
carlosparadis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Connor,
This needs substantial revision. I did not review the entirety of the notebook because the narrative is somewhat confusing.
The overall structure of the notebook should be something like this:
- You introduce the user to GitHub Events. You explain you can obtain them with Kaiaulu. You motivate your code, saying you are interested in seeing if developers often follow a similar process. This is all brief, the details go in subsequent sections.
- Right after the introduction, you explain the overall folder organization between kaiaulu, process mining, and the raw data exec will give to you.
- GitHub Events: You minimally explain that Kaiaulu organizes its features in functions, which are used in Notebooks and execs. For details on how to construct config file for events, you defer the user to reading the Events Notebook (do ensure the notebook does explain what you claim it does).
- Explain that by running the config, Kaiaulu will save the files to rawdata (specify the exact folder). Don't create a sub-section for this.
- After executing the script, simply load the table in pandas and show the user what it looks like. Explain to the user. Again no need to keep making sub-sections here, GitHub Events section suffice.
- Proceed to explain and re-motivate your work, that is hard to guess if people are following the same process.
- Process Mining GitHub Events:
- In this new section, begin by explaining you will demonstrate it in practice. Using the already loaded pandas dataframe of the real dataset, apply the 3 process mining functions.
By this point, the user should be able to use your code to their needs. What you then need is a section that explains experiments 1, 2 and 3. Where are the fake data generators here?
Also, where is the api folder? Where is the markdown? This does not look like the most recent version of the code. I still see functions being defined in the notebook.
Also:
Where is the env.yml conda file?
images/occurrence_dfg.png
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not version binary files. Generate them on the fly in the code so users generate it themselves when they run your notebook.
images/performance_dfg.png
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not version binary files. Generate them on the fly in the code so users generate it themselves when they run your notebook.
images/petri_net.png
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not version binary files. Generate them on the fly in the code so users generate it themselves when they run your notebook.
images/process_graph.png
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not version binary files. Generate them on the fly in the code so users generate it themselves when they run your notebook.
images/process_tree.png
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not version binary files. Generate them on the fly in the code so users generate it themselves when they run your notebook.
issue_event_processing.ipynb
Outdated
| "cell_type": "markdown", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "### Download and Parse data with Kaiaulu ghevents.R (CLI)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to: GitHub Events
issue_event_processing.ipynb
Outdated
| "cwd = os.getcwd()\n", | ||
| "os.chdir(os.path.expanduser(\"~/Desktop/Kaiaulu/Working_issues/kaiaulu/\"))\n", | ||
| "\n", | ||
| "# To download use the download command specifing the <config_file> <github_token>\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too many sub-sections. You want to be consistent that the section division doesn't get too specific or it defeats the purpose of being a section.
Change to be a sub-section, enumerate the section (it is hard on the eye to see what is header or sub-header when note close together, so use 1., 1.1, etc). Rename to: GitHub Events using Kaiaulu
issue_event_processing.ipynb
Outdated
| "\n", | ||
| "# To download use the download command specifing the <config_file> <github_token>\n", | ||
| "\n", | ||
| "command = f\"Rscript exec/ghevents.R download conf/kaiaulu.yml --token_path=~/.ssh/github_token\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a good path. It assumes the user is running inside the Kaiaulu project the exec. Rather, the reasonable assumption is that the user downloaded your code (process_mining) and wants to run the exec from Kaiaulu. So the current working directory should assume the user is at process_mining folder, goes one level above, then access the kaiaulu folder.
What I recommend you do is to explain the expected folder organization of the projects before throwing paths left and right. Assume the following:
kaiaulu/kaiaulu/exec/ghevents.R
kaiaulu/rawdata/
process_mining/notebooks/issue_event_processing.ipynb
issue_event_processing.ipynb
Outdated
| "\n", | ||
| "As stated above it is reccomended you start with only a few event issues. To do this you can open the created issue_output.csv with Excel or Google Sheets and modify the table to only include a few. \n", | ||
| "\n", | ||
| "Note: commit_output.csv has been implemented for furture development and is note currently used for any process modeling. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what is going on here. These should not be the file names that Kaiaulu will download the rawdata as. How do you go from the rawdata Kaiaulu downoaded to these csvs? Why are these two tables being passed as input here? Are both being generated by ghevents download? That sounds odd.
issue_event_processing.ipynb
Outdated
| "source": [ | ||
| "#### Execute Parse Command\n", | ||
| "\n", | ||
| "As stated above it is reccomended you start with only a few event issues. To do this you can open the created issue_output.csv with Excel or Google Sheets and modify the table to only include a few. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not useful. The user will not be able to create the files by hand with just this as instruction. Did you not say you created the fake generator for the examples? Where are they located? That's what they were for here.
This version includes the following: - A separate API folder with process mining functions and helper functions - Function comments added to support pdoc format Signed-off-by: Connor Narowetz <[email protected]>
- Changed modify_event_in_csv function comment Signed-off-by: Connor Narowetz <[email protected]>
carlosparadis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here some more comments on your API. I may need to do one more pass after all the changes to check the overall consistent of what file has what functions in the API, and how they are used on the Notebook after all the changes.
Thanks!
api/csv_generator.py
Outdated
| fake = Faker() | ||
|
|
||
| # Function to generate fake data | ||
| def generate_csv_file(num_issues=1, num_events_per_issue=7, output_csv="generated_csv", seed=2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice the name of your function vs the title of the docs: The more appropriate name here is generate_fake_event_log. The genetate_csv_file is too broad and any logic could be doing this.
api/csv_generator.py
Outdated
| from datetime import datetime, timedelta | ||
|
|
||
| # Initialize faker | ||
| fake = Faker() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API files should not be initializing objects. This is function definition only.
api/csv_generator.py
Outdated
| @@ -0,0 +1,78 @@ | |||
| import random | |||
| import pandas as pd | |||
| from faker import Faker | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice finding!
api/csv_generator.py
Outdated
| # Function to generate fake data | ||
| def generate_csv_file(num_issues=1, num_events_per_issue=7, output_csv="generated_csv", seed=2): | ||
| """ | ||
| Generates fake event log data and saves it to a CSV file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expand description so it explains what fake set of activities is happening. Given the poster feedback, you may want to do some adjustments to reflect a Kaiaulu workflow.
api/process_visual_generation.py
Outdated
|
|
||
| ## Features | ||
|
|
||
| This module provides functions to: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-comment english in a .py file. Not sure this will work.
api/process_visual_generation.py
Outdated
|
|
||
| ## Example | ||
|
|
||
| ```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what is going on here. Is this supposed to be a ipynb?
api/process_visual_generation.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename file to process_discovery.py ; stay consistent to the field terminology
api/process_visual_generation.py
Outdated
| import pm4py | ||
|
|
||
|
|
||
| def start_end_activities(csv_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this belongs to process_discovery.py. Seems event-log related? Maybe a file for event log manipulation may make more sense, or just place on io.py
api/process_visual_generation.py
Outdated
| """ | ||
| Reads an event log from a CSV file and returns its start and end activities. | ||
|
|
||
| Assumes the CSV has the following columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference the python function that creates the .csv instead, and that function contains the specification of the csv returned. Have the user know what function to use, rather than explain it as if they will code it! Remember Kaiaulu has \code{\link{function_name}} in R. Find the equivalent in python.
You should also see itm0 for the "See Also" section so you can reference on the returns what functions will use the output of another.
- Notebook now starts the user from the beginning - Uses relative paths - Includes three "experiements" - API now asks user action 'view' 'save' or 'both' - Renamed API functions - Includes env.yml
- Corrected relative path and api import
carlosparadis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just need to patch the env file. The rest looks pretty good. Please check the env file of the sentiment project is also updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this for me. What you did was export the env.yml from conda. This hardcoded the exact version down to commit hash of your environment, and also add here all the indirect dependency matrix of libraries you don't even use.
You want to create this file manually and only list on dependencies the libraries you actually import.
env.yml
Outdated
| @@ -0,0 +1,548 @@ | |||
| name: base | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name it process_mining
| "metadata": {}, | ||
| "source": [ | ||
| "# Introduction\n", | ||
| "Event logs are the foundation of process mining. They capture records of activities within a system, providing information about when actions occur and what those actions are. For example, in GitHub Issue Events, actions such as assigning users, labeling issues, and closing issues are recorded. Together, these events tell the full story of the process from start to finish. Event logs can be transformed into differnt process graphs, which visually represent the flow of activities and how they connect. These graphs make it easier to identify inefficiencies, bottlenecks, and deviations from expected workflows. They provide valuable insights for process improvement and optimization. \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo in diffrent. Please run a spell checker.
- env.yml changed - file paths in notebook changed
- Added full env with versions
Signed-off-by: Connor Narowetz <[email protected]>
Methods:
Notes: Installs pm4py if not already installed.