Privacy-Preserving Techniques for Process Mining

May 24, 2017 | Autor: Gerald Divinagracia | Categoria: Data Mining, Business Process Management, Process Mining, Privacy and data protection

Share Embed

Denunciar este link

Descrição do Produto

Privacy-Preserving Techniques for Process Mining

IFN600 – Assignment 3 Gerald G. Divinagracia Student ID: n9528491

“This research is not intended to be executed in future project units.”

PROBLEM STATEMENT “Process mining is an invasive technology for business process improvement initiatives in government agencies and yet, the government is hesitant to use it because it lacks a privacypreserving technique to obfuscate sensitive information in its analysis process.” Performance management in government agencies is construed as the efficient, effective, and rational delivery of public goods and services” (Domingo & Reyes, 2011). It is for this reason that the Philippine government embarked in an ambitious journey to improve its public administration performance, thus introducing reform programs from 1999 to 2009 (Berman, 2016). Moreover, measuring the success of these programs can be realised by automation through information technology. However, embedding defective processes to information systems simply maximizes inefficient processes (Hammer, 1990), a clear indication that there is a need to properly design and architect these processes through Business Process Management. Every organisation provides products or services to their customers and these are executed through processes, which is the “chain of events, activities and decisions” that people or software entities implement. Moreover, Business Process Management (BPM) is managing these end-to-end chain of events, activities and decisions as it delivers value to the customer or organisation (Dumas, La Rosa, Mendling, & Rejers, 2013). Building a BPM system is not a trivial matter to execute and there is a need to follow a well-defined framework to ensure that the investment made by the organization is money well spent. One popular framework designed by (Dumas, et al., 2013) introduced the BPM Lifecycle Phases and as shown from Figure 1, the five phases are as follows: process identification, process discovery, process analysis, process redesign, process implementation, and process monitoring and controlling.

Figure 1. BPM Lifecycle

Of the five phases in the BPM Lifecycle, much time and effort are needed in process identification (process architecture) and process discovery (as-is processes) because it entails conducting face-to-face workshops with all the stakeholders involved in the overall end-to-end business processes and in effect, scheduling a lot of meetings from different departments of the organization. Similarly, for the Department of Foreign Affairs, this is not an option as it will adversely impact daily operations and depending on the complexity of the business processes in question, this BPM approach could run for weeks or even months. A much efficient approach is warranted and this is where the emergent technology, Process Mining, comes into the picture. Process Mining is a recent technology in data science that provides the necessary tools and techniques in providing evidence-based insights from event logs collected from the organisation’s information systems. This is an innovation from the Business Process Management (BPM) domain that “unifies process analysis and management on the one hand, data analysis and mining on the other” (van der Aalst, 2016). Moreover, in 2011, the Institute of Electrical and Electronic Engineers (IEEE) created a “manifesto” detailing six (6) guiding principles and eleven (11) challenges for researchers, in enhancing process mining technology (Van Der Aalst et al., 2011). After five years since the IEEE’s declaration, various industries applied process mining for business process analysis insights: healthcare (Mans, van der Aalst, & Vanwersch, 2015) (Mans et al., 2008), banking and finance (Gehrke & Mueller-Wickop, 2010; van Aalst, van Hee, van Werf, & Verdonk, 2010), (Suriadi, Wynn, Ouyang, ter Hofstede, & van Dijk, 2013), hardware manufacturers (Rozinat, de Jong, Günther, & van der Aalst, 2009) and government municipalities (Gottschalk, Wagemakers, Jansen-Vullers, van der Aalst, & La Rosa, 2009), among others. It is worthy to note that upon implementing process mining at a Dutch Government project, National Public Works Development, there were privacy issues whereby key information needed for the analysis was taken out and in effect, not fully maximising the features of process mining technology (van der Aalst et al., 2007). The failure in fully leveraging the capabilities of process mining was the fact that there is no “privacy-preserving” feature present, which is the same limitation and gap that was documented in the manifesto at the IEEE gathering in 2011. Moreover, hiding sensitive process details in data is called “privacypreserving” and although extensively used in data mining domain, this is a work-in-progress to process mining technology. Government agencies, particularly DFA, will likely use process mining as an invasive tool for business process improvement insights as long as it does not compromise sensitive information. Furthermore, literature review provided the evidence and supporting references that no extensive research was conducted in one of the challenge in the process mining manifesto, specifically, “Challenge No. 7 on privacy preserving techniques” (Van Der Aalst, et al., 2011, p. 190). One particular literature journal used process mining to analyse encrypted event logs but was done using a BPM automation tool as an a-priori condition rather than using raw event-log data, which is the focus of this research problem (Irshad et al., 2015). For these reasons, a research on “privacy-preserving process mining technique to obfuscate confidential information” does not exist and therefore, novel. Furthermore, government institutions can maximise the features of process mining, as an invasive technology for business process insights, with privacypreserving techniques included as one of its capabilities. It is the objective of this research that, with privacy preserving process mining techniques, the Philippine Department of Foreign Affairs

can finally pursue its process improvement initiatives without exposing or compromising confidential information. The general research problem looks into the inefficiencies in business processes for passport services at the Philippine Department of Foreign Affairs (DFA), resulting to unproductive time and dissatisfied taxpayers. However, the DFA, while wanting to fix the issue, is hesitant that any business process initiatives will likely to expose any confidential information about government employees and their daily activities. As a result, my research study will look into privacy-preserving techniques that will obfuscate sensitive information for process mining. In addition, the primary focus will be in the pragmatic application of this new technique to government agencies as they embark in business process improvement projects. The next sections will discuss the chosen research question that may provide insights in remediating the overall problem. Followed by a discussion on the research methodology the project will use, a comprehensive enumeration of the expected outcomes as a result of using the methodology, and finally, a discussion on the new knowledge uncovered. RESEARCH QUESTION “How can privacy-preserving data mining techniques be leveraged for process mining?” Data mining is a well-researched domain that merges the three worlds of databases, statistics and artificial intelligence (Lindell & Pinkas, 2002). In the information age, data are stored in massively large volumes and understanding it entails extracting useful information or knowledge for insightful use. In contrast to statistics where hypotheses are established first before searching for possible causes of problems, data mining performs its pattern analysis without an a-priori condition. Furthermore, one concern with the collection of large amount of data is that of confidentiality. Case in point for medical information systems, where highlysensitive information about the patients are stored in databases and by law, these information need to be obfuscated. In contrast, there are also situations where sharing of databases or infrastructure may prove advantageous to all parties, an example would be cloud-based IaaS (Infrastructure as a Service), SaaS (Software as a Service) or PaaS (Process as a Service) where a sharing of resources constitutes “economies of scale” and thereby, reducing costs to both consumers and providers. Despite the potential benefits though, it cannot be fully realised due to the sharing of confidential information. For these reasons, data mining extensively researched and potentially applied privacy-preserving techniques to encrypt information as it is stored in common infrastructure or databases. Similarly, given that process mining and data mining are closely related, where the former integrates “process analysis and management on the one hand, data mining and analysis on the other hand” (Aalst, 2011). In other words, process mining is similar to data mining that it is possible to leverage its data privacy-preserving techniques so that process mining analysis can proceed without any worries on data confidentiality issues. On the other hand, the main difference of data mining from process mining is the kind of data and algorithm used in its analysis. Moreover, a comprehensive literature review conducted by (Aldeen, Salleh, &

Razzaque, 2015) provided substantial evidences that privacy-preserving techniques were extensively researched in data mining domain and as shown from Table 1, six (6) powerful techniques can possibly be leveraged for process mining analysis.

Related Literatures on Privacy-Preserving Data Mining (PPDM) No. Authors 1

Stan Matwin

2

Dinusha Vatsalan, Peter Christen, Vassilios S. Verykios

3

Xinjun Qi, Mingkui Zong

4

R. Mukkamala, V.G. Ashok

5

R. Raju, R. Komalavalli, V. Kesavkumar

6

Lukas Malin, Jan Hajny

Fundamental Concepts Reviewed and analysed existing PPDM methods in the industry (Matwin, 2013). Introduced the idea of linking records or data with the technique called "privacy-preserving record linkage" (PPRL). Additionally, the authors presented a taxonomy of PPRL techniques (Vatsalan, Christen, & Verykios, 2013). Reviewed existing PPDM techniques in the industry and focused on "data distortion method" and "privacy data release". The authors likewise introduced ways how to measure the strengths of PPDM and the difficulties involved to execute the techniques (Qi & Zong, 2012). The authors introduced "fuzzy-based mapping techniques" in obfuscating data sets for data mining. They contributed four areas in PPDM: (1) modification of the fuzzy function definition, (2) introduced seven methods to combine data fields into a single value, (3) used similarity technique for comparing mapped data sets from original, and (4) measuring the effectiveness of the data mapping technique (Mukkamala & Ashok, 2011). Introduced the "Add to Multiply" protocol based on homomorphic encryption techniques (Raju, Komalavalli, & Kesavakumar, 2009). Provided a PPDM solution using cloud services by "anonymous login" of users, thereby obfuscating any traces of online activities to a particular user (Malina & Hajny, 2013). Table 1. Top Six PPDM References

Answering this question will benefit the research drastically as it provides high impact, relevance, and faster time-to-market strategy due to the fact that the research will not start from scratch in answering the research problem but rather, leverage best practices from data mining. In brief, since process mining is related to data mining, any technological innovations and learnings between these domains can be leveraged for further research. Finally, privacy-preserving for process mining does not exist and although a research conducted by (Irshad, et al., 2015) leveraged process mining to reverse-engineer obfuscated information as an output from an enterprise information system, the process did not cover obfuscating event logs from process execution, a gap that this research study will address. Privacy-preserving techniques in process mining is a novel area of research that large organisations and government institutions can leverage on.

RESEARCH METHODOLOGY and EXPECTED OUTCOMES The research question provides a direction how this study will be conducted. Additionally, given that techniques from Data Mining will be leveraged through designing, developing and delivering a novel process mining privacy-preserving technique, the overall approach in answering the research question is “artefact-oriented” research methodology. Moreover, from literature review, conducting research to produce new and innovative artefacts are within the domain of design-science, which basically aims to transcend organisational and human capabilities (Hevner, March, Park, & Ram, 2004) through designing new information systems. Further analysis from different design-science research methodologies, conducted by (Peffers, Tuunanen, Rothenberger, & Chatterjee, 2007), stressed that design-science is an accepted methodology to generate new knowledge. For these reasons, the specific design-science methodology this research study will use is Design Science Research (DSR) by (Kuechler & Vaishnavi, 2004). From the lecture notes in Computer Science 6th International Conference (Sinha & Vitharana), twenty nine (29) examples of successful design-science papers were presented and are generally aligned with the DSR Process Model (Figure 2).

Figure 2. Design Science Research Process Model

Furthermore, although some of the artefacts produced in this research study will be new privacy-preserving process mining algorithm constructs, event-log obfuscation processes, evaluation analysis papers or other supporting documentations necessary for the success of the study, these are not the output of design-science research project. Consequently, the primary deliverable of this research is design-science knowledge and given that it will leverage techniques from data mining, these are knowledge or solutions for known problems that will be “leveraged” for process mining domain. In other words, from the knowledge contribution

framework of (Kuechler & Vaishnavi, 2004), this research is well-positioned at the Adaptation quadrant (Figure 3).

Figure 3. DSR Knowledge Contribution Framework

Using DSR as the overall methodology, the implementation starts with a survey of literatures on data mining privacy-preserving techniques, followed by event-log data collection from a government agency, followed by design science research execution, and finally, research documentation. The following diagrams shows the two major components of the project implementation.

Figure 4. Event Data Extraction Process

Event data will be extracted from various sources in a government agency and will undergo the necessary process mining transformation. Moreover, the objective in this phase is to come up with an event-log for obfuscation process in the next phase.

Figure 5. DSR Execution Proper

Figure 5 shows the execution of the DSR process and this is where the privacy-preserving data mining techniques will be tested iteratively against the event-logs created from the Data Collection phase. A new algorithm will come out from this phase once the Baseline Metrics matches the Result Metrics, thereby resulting to further research elaboration for new knowledge analysis. The following table lists the detailed steps involved in implementing the research project: Process (Tasks) A. Literature Survey 1. Conduct literature review on the overall research problem

B. Data collection 1. Selection of organisation for sourcing of data

2. Setup a meeting with the organisation

Detailed Instructions (Including how data was collected)

Kinds of Data Collected and Tangible Outcomes

Using the overall research problem, come up with a list of keywords that is relevant to the study. From the string of keywords identified, use Google Scholar and QUT Search Engine to conduct an extensive literature review of relevant references needed for the study. Iterate this process once a "detailed question" was identified to narrow the research scope.

A document containing a list of at least 5-10 highly-relevant references and the questions needed to narrow the research.

Conduct extensive research (e.g. online portal or library class materials) on a target organisation, with sensitive information, to conduct the research project with. Since the overall research problem is looking into government agencies, use this filter to narrow the search. Use the following as selection criteria: (1) have sensitive data, (2) business processes are embedded in information systems, (3) willing to undergo process analysis for continuous improvement, and (4) there is support from upper management for such initiative.

A document containing a list of government agencies with corresponding ratings or score in shortlisting which organisation meet the desired criteria.

Once a government agency was identified, schedule a courtesy meeting with the organisation's management through the Research Sponsoring Body (e.g. QUT) and explain the high-level objective of the meeting.

A document containing the agreed meeting schedule, duration and attendees. This can be in an e-mail format or formalised hardcopy document.

3. Present process mining value-proposition

Using the overall research problem and identified question, conduct a presentation to the target government agency regarding the research study, the relevant technologies and where the government agency play in the picture.

A presentation material containing the research strategy with pertinent details. An agreement, be it an informal handshake agreement or a formalised documented one, from the government agency signifying an interest on the proposed research project. This could be in a form of LOI (Letter of Intent) and signed by all parties involved after the meeting.

4. Secure contract with organisation for process mining analysis

After the organisation signified an intent to participate in the research project, create a contract with the help of Legal Services (e.g. QUT Legal) and impute all the details about the research project and the participating organisation. Take note to consider all terms and conditions which is agreed by all parties prior to formalising the contract creation process. Once the contract is complete, provide to all parties for signatures and have the document Notarized and distribute copies to all parties. Now with the contract signed, create an Ethical Agreement document detailing what data or information will be collected from the government agency. The document should explicitly mention what data sources will be used for the research and should there be additional data sources needed, a clause must be put to indicate how to deal with such cases.

A signed contract document with all the research project details.

6. Secure ethical agreement on data protection

Besides the ethical agreement on data collection, an agreement on data protection must be created. The agreement will disclose what will be done to the data, how will it be stored and how will it be disposed.

A signed Ethical Agreement for Data Protection by all parties involved.

7. Establish non-disclosure agreement with organisation

After the Ethical Agreements, a Non-Disclosure Agreement (NDA) need to be created to signify that whatever information (sensitive or not) from the government agency will not be disclosed for whatever reason. The NDA should include a clause how to deal with exceptional cases.

A signed NDA by all parties involved.

8. Present and agree on overall research plan

With all the necessary documentations signed and distributed, conduct a project kick-off to include all the parties involved. Present the overall plan, highlighting the following: (1) data sources, (2) key activities, (3) subjectmatter-experts from the government agency, (4) other resources needed from the government agency, (5) highlevel timeline and (6) change process in handling revisions to scope. Work with the government agency in identifying what particular business process the research will use. Once the specific business process is identified, work with the concerned departments what "particular questions need to be answered", this is to gain an understanding on the flow of the processes and to scope the business process further. Using (Rosa, 2016)’s BPM methodology, create BPM notations reflecting the identified end-to-end business process flow. Create BPMN models to the level that data sources can be detected.

An informal agreement from all the parties involved regarding the overall research project strategy.

Armed with the BPMN models, meet with the DB Admin, representatives from the identified department and IT department of the government agency. Present the BPMN models and discuss the data collection strategy.

A documented agreement (e.g. Minutes of Meeting) on the data collection strategy.

Meet with the government agency's IT department and identify the applications or information systems touched by the business process. This is to ensure that whoever owns the information systems, they are aware that a research activity will be conducted against their applications.

A documented agreement (e.g. Minutes of Meeting) on the concurrence of information system owners

5. Secure ethical agreement on data collection

9. Identify with organisation what particular business process to analyse

10. Meet and discuss data collection strategy with IT SME from organisation and DB Admin 11. Identify the applications of the specified business process to analyse

A signed Ethical Agreement for Data Collection by all parties involved.

A Signavio BPMN models from Level 1 to Level 5 or to a level that is significant enough to identify the data sources of the processes. A document containing a list of questions to narrow the scope of the business process.

12. Identify the database sources of the applications involved in the business processes 13. Identify the tables needed for the business processes

Once all information systems are identified, list all the database sources needed for the analysis.

A document containing the applications and the corresponding databases it uses.

From the list of databases, identify which tables are needed for data collection that follows the identified business process.

A document containing a list of database tables needed for the research. This document could be in a form of a data architecture diagram.

14. Collect data from the identified tables

From the list of tables, identify what specific "fields" are needed for the data collection. Create an EntityRelationship-Diagram to capture all data fields needed. Using the ERD, "extract" all the necessary data using RapidMiner (See Figure 4 for the Data Collection flow).

An ERD containing all the fields needed for data collection. Extracted "raw data" and stored in RapidMiner. Historical event logs collected are minimum three (3) months old.

15. Process mining data preprocessing

Leverage Data Mining Pre-process methods. Ensure that the data fields captured are the sources that traverses each step in the process, in other words, the "event logs". Iteratively identify if all necessary data are collected in the next steps.

A document containing Data Mining preprocessing methods that worked. Extracted data and stored in a database called "Unfiltered Event Logs".

Using the "Unfiltered Event Logs", conduct 'Data Filtering" 16. Filter data by deleting process using process mining techniques. noisy and unwanted information Conduct further process mining pre-processing by taking the 17. Data transformation by identifying what are the case well-formatted event logs and classifying it according to "cases" and "instances" based from the specified business instances for process mining process questions identified in Step 1 of Data Collection analysis phase. C. Design Science Research (DSR) Create a DSR Plan that consolidates the research problem 1. Awareness of problem

Filtered data that follows the "event-logs" format (Aalst) with "noisy" and "unwanted information" taken out.

2. Setup a meeting with the organisation

A document containing the agreement on concurrency with the Solution Development Team and the stakeholders involved at DSR.

statement, normalised filtered event logs with cases and instances identified. Present the Plan to the Solution Development team and all stakeholders going to be involved in the execution.

Normalised filtered event logs, which are fine-grained and ready for the actual "Design Science Research Execution" phase.

A DSR Plan for the Solution Development Team and stakeholders going to be involved.

3. Software development 1.

Requirements specification

Armed with the DSR Plan and normalised filtered event logs, discuss with the Solution Development and come up with execution strategy.

Documented requirements specifications and ground rules for software development processes. Included in the document are: (1) coding standards, (2) automated build processes, (3) version control and configuration management, (4) testing methodology, and (5) deployment steps.

2.

Design privacypreserving algorithms for process mining

From the literature review, have the development team extract and configure the "privacy-preserving data mining techniques" for process mining. Using the normalised and filtered event logs, run it through the Process Mining algorithm (e.g. Alpha) and use the "Baseline Metrics" output as the control and to be used for testing later (See Figure 5 for the DSR process flow).

Documented strategy on what privacypreserving data mining techniques will be used for the execution proper. Documented "Baseline Metrics" and to be used for control in later testing.

3.

Apply privacypreserving techniques on the data Build & Test Algorithms

Using the privacy-preserving techniques from Data Mining, obfuscate the "normalised filtered event logs".

Privacy-Preserved Event Logs stored in the database (See Figure 5).

Using the "privacy-preserved event logs", run the Process Mining algorithm (e.g. Alpha) and document the output as "Result Metrics". Test "Baseline Metrics" against the "Result Metrics" and if it matches, this is a candidate for "Privacy-Preserving Process Mining Technique". Otherwise, evaluate by iterating and running another "Privacy-Preserving Data Mining Technique" for data obfuscation.

A. B.

4.

Documented sample of results. Documented Test Scripts in running the testing process. It is recommended to use three cycles in testing with three separate documentations: (1) normal test, (2) exception test, and (3) error test.

5.

Continuous circumscription

6.

Deployment and Test "privacypreserving process mining technique" and run the results to the process mining analysis tool 4. Evaluate and conduct retrospection on development processes 5. Establish the development conclusion 6. Contribute to knowledge documentation

Iteratively check from the results of the "Build & Test Algorithms" whether t answers the overall problem and an indication of a new knowledge discovered. Once an algorithm matched and is working with process mining, deploy the algorithm and test the algorithm for further analysis on its performance and effectiveness. This is conducted by running through the process mining analysis tool and deriving the results .

Documented result if overall research problem has been answered, otherwise repeat the development and test process. Documented statistical analysis from process mining in terms of efficiency, effectiveness and encryption strength of obfuscation of event logs while it is processed using ProM“an Open Source framework for process mining algorithms” (PMG, 2016).

Conduct a review with all the parties involved in the project and document lessons learned. Further, conduct a retrospective with the Solution Development Team regarding what worked and not worked well in the development process. Gather all data and information related to the overall DSR implementation. Finalise and document the research findings with supported quantitative analysis Further assess and document the findings regarding its relevance and contribution to the overall research problem

Documented lessons learned and best practices.

Documented research findings. Documented research findings with respect to its contribution to new knowledge

D. Overall Documentation 1. Conduct research documentation from design, analysis and result 2. Establish new knowledge and include in documentation 3. Peer review paper for conformance and approval

Gather all documentations from Literature review, Data Collection and DSR Execution and start documenting the Research proper

Document that details everything about the research, from plan to execution and result.

Document the new knowledge generated and provide details in its analysis.

Documented new knowledge.

Present the Research paper to the sponsor. Once approved, present to the government agency, particularly how the organisation can use process mining technology for process improvement initiatives in the future.

Documented approval and successfully peer reviewed Research Paper

4. Final submission for approval

Finalise the documentation of the Research Paper for final submission

Finalised and fully documented Research Paper

Table 2. Research Detailed Process Steps

As for the analysis of raw data in the methodology, these are identified from the highlighted activities in Table 2 and further summarised in the following table: Kind of Data

Types of Data

Sources

Will Qualitative, Quantitative or both be used to analyse the data to answer the research question?

What new information will be gained from the analysis?

Data mining privacypreserving algorithm techniques

Pseudocode / Source codes

Journals, books, case studies and research thesis

Quantitative analysis will be applied on the efficiency and effectivity of each data mining algorithm used for process mining. A data mining algorithm for process mining will be discovered.

Privacy-Preserving data mining algorithms for obfuscating the event logs for process mining analysis.

Data mining preprocessing methods

Narrative Text

Journals, books, case studies and research thesis

Quantitative analysis on the efficiency and effectivity of each data mining preprocessing methods used. A new data mining pre-processing technique will be discovered for process mining.

New data mining pre-processing techniques leveraged for process mining.

Data mining development and testing methodologies

Narrative Text

Journals, books, case studies and research thesis

Quantitative analysis on the efficiency and effectivity of each data mining development and testing methodology used.

New development and testing methodologies on how to develop privacy-preserving process mining techniques.

Event Logs from a government agency's business process

Continuous, Attributes, nominal, categorical (ex. event id, gender of resources, & process activities respectively)

Existing information systems in the government organisation (e.g. DFA); This may include the following: (1) logbooks or manuallydocumented audit trails of processes, (2) interview documentations with process actors (e.g. if event logs are not stored) and other sources of process data, (3) webbased information systems that the business process is currently using, (4) web server logs of the information systems, and (5) database logs of information systems.

Part1: Data Collection in creating the Filtered Event Logs - As the main data source for testing and discovering a privacy-preserving process mining technique, the event log need to undergo a transformation from "coarsegrained" to "fine-grained" scoping. Additionally, noisy data need to be discarded in coming up with a "quantitatively" baselined event logs. Process mining techniques in extracting event logs will be used to ensure "data quality" and fit-for-use. Data collection will follow Process Mining's 12 event data guidelines from (van der Aalst, 2016).

Part1: Data Collection in creating the Filtered Event Logs - Information on the kinds of event logs created from government agencies. However, the insights here will be dependent on the process architecture of organisations. Thus, the extensive the elaboration on the process architecture, the detailed are the event logs.

Part 2: Using the Filtered Event Logs for DSR Execution - Prior to using the Privacy-Preserving Data Mining (PPDM) Techniques to obfuscate the data, the Filtered Event Logs will run through the Process Mining engine for process analysis, the output of which will become the "control" data and will be used for comparison in later stages. Furthermore, PPDM techniques will be used to obfuscate the Filtered Event Logs (examples of capabilities that will be used are specified from Table 1. Top six PPDM Literatures). Once the Filtered Event Logs are obfuscated, it will run through the Process Mining engine for process analysis, which consequently, its result will be compared to "control" data. If the results are "equal", this is an indication that the PPDM used works well with process mining. Otherwise, the process of discovering the right PPDM continues iteratively.

- Measuring the Quality of the event logs will use four criteria: (1) fitness, (2) generalization, (3) simplicity, and (4) precision (Aalst, 2011). Part 2: Using the Filtered Event Logs for DSR Execution - Information from analysis results that process mining event logs may be different and more challenging to obfuscate than the discrete data used in Data Mining and adversely affect process analysis. - Information results showing how obfuscated data affects the effectiveness and performance of process mining analysis. - Information results showing that sensitive data from government agencies can be obfuscated for process analysis, thereby leveraging the full capability of process mining in process improvement initiatives. - Measuring the equivalence of “Baseline Metrics” against “Result Metrics” will use Hypothesis Testing through One-Way Anova (Analysis of Variance), where it will test the equality of three or more Means of samples at one time(OSSS, 2016).

Table 3. Analysis of Raw Data and New Information Gained

Additionally, the research will roughly take 12 months from Literature Review to the final submission of the Research Findings (See Figure 1 for the details).

Figure 6. Research Project High-Level Timeline

The following resources will be needed to conduct the research: - 1 researcher (performing as project manager) - 1 data analyst (pre-processing of raw data) - 1 software developer (algorithm development, testing and analysis) - 1 statistician (data sampling and analysis) - 1 subject matter expert (SME), on business processes, from the government agency - 1 information technology SME from the government agency Finally, here is a list of special resources and equipment needed for the research: - Four (4) high-end powerful laptops per team members (e.g. researcher, data analyst, software developer and statistician) with the following specification: o Windows Operating System (latest version) o 1 TB Hard Disk drive o 8 GB RAM o Microsoft Office (latest version) o Microsoft Project (latest version) - 2 licenses for RapidMiner software for data collection and analysis - 2 licenses for Disco process mining analysis tool - 1 license for SigmaXL for statistical analysis - 2 open source licenses of ProM for process mining analysis tool

NEW KNOWLEDGE After a year of implementing the research study and thorough iterations in leveraging the privacy-preserving data mining techniques to uncover innovative solutions for process mining, the new knowledge delivered is: “Provides insights towards how leveraging secure process mining capabilities can change government agency behaviour towards process improvement initiatives.” Despite the advent of “open data”, where government institutions openly share information to the public, there are still a plethora of privacy-restricted and confidential data (Janssen, Charalabidis, & Zuiderwijk, 2012) that needs to be managed. These are information that government institutions cannot readily share to the public and may compromise public servants or operational details about government processes. However, similar with commercial organisations, government agencies need to continuously improve the way they deliver services to the public and one way to do that is through process quality improvement initiatives. Furthermore, as the main theme of this research, process mining is one of the process improvement technology that can analyse the organisation’s performance through evidencebased information, not through lengthy interviews and surveys. However, the only caveat in using process mining is the fact that currently, it does not have a “privacy-preserving” technique to obfuscate sensitive data, a feature that government agency would like to have should they want to fully leverage the full capabilities of process mining for any process improvement initiatives. For these reasons, this research study will attempt to address that gap.

This project aims to provide a new understanding that having a secure process mining capability will change the way government agencies look at improving their process performance. Additionally, knowing that sensitive information concerning the identities of employees and their activities in the organisation are not exposed, the focus of the process improvement effort will shift from “people-oriented” to “process-oriented” way of thinking and consequently, takes away the “blame-the-doer” stigma from the people. Moreover, the shift in mind-set ensures that the organisation is open to change to be a successful service provider. Furthermore, as described by (Brocke & Rosemann, 2014), this shift in organisation culture will lead to a better appreciation of business process management as an enabler rather than a barrier to the organisation’s success. Case in point, from the business process improvement project in the Netherlands (Dutch National Public Works Department) where process mining was implemented, the authors were not able to provide full insights of the study to the proponents because of “privacy issues, labour contracts, and other agreements” that was linked to process performers in the organisation, thereby limiting the organisation’s ability to learn from their areas for improvement (van der Aalst, et al., 2007). In other words, findings from this research will provide government agencies the capability to realise the full potential of business process improvement through leveraging a secure process mining technology. Besides government agencies, commercial institutions can leverage the new knowledge, especially when dealing with confidential information across a shared infrastructure. Case in point is Salesforce.com, a cloud-based software service provider that offers its customers a common infrastructure and process model to run their core businesses (Pourmasoumi & Bagheri, 2016). Salesforce.com can leverage the new knowledge by conducting business process improvement efforts without any issues from its customers, knowing that their confidential business information is not compromised in any improvement initiatives. Furthermore, this confidence from the customers may provide Salesforce.com more loyal subscribers and potentially, increase business opportunities. Consequently, in the process of uncovering the new knowledge, two possible “new knowledges” may be encountered as a side-effect. Although not a focus of this research study, it is worthy to note these additional findings: a.) "A syntactic and semantic realisation how obfuscation of process mining event logs can affect process analysis." b.) "Provide insights what makes process mining a challenge in terms of securing data fields for process analysis." The first finding focuses on the effects of running process mining with obfuscated event logs as input and discovering its effects using various metrics criteria. Insights from this research may uncover better or new techniques in mining process procedures. Furthermore, the second finding focuses on insights in securing data fields from event logs but at the same time, not compromising data linkages or relationships. If pursued as a separate research agenda, these two findings will uncover new knowledges with respect to process analysis and how it can further Business Process Management as a specific domain study in Information Systems. Finally, it is

highly recommended that these two findings may be used for future research initiatives as it will help understand process mining as an enabler to process improvement initiatives.

References Aalst, W. v. d. (2011). Process mining: discovery, conformance and enhancement of business processes (Vol. 1st). Berlin;Heidelberg;New York;: Springer. Aldeen, Y. A. A. S., Salleh, M., & Razzaque, M. A. (2015). A comprehensive review on privacy preserving data mining. [journal article]. SpringerPlus, 4(1), 1-36. Retrieved from http://dx.doi.org/10.1186/s40064-015-1481-x. doi:10.1186/s40064-015-1481-x Berman, E. M. (2016). Public Administration in Southeast Asia: Thailand, Philippines, Malaysia, Hong Kong, and Macao: CRC Press. Brocke, J. v., & Rosemann, M. (2014). Handbook on Business Process Management 2: Strategic Alignment, Governance, People and Culture: Springer Publishing Company, Incorporated. Domingo, M., & Reyes, D. (2011). Performance management reforms in the Philippines. Public administration in Southeast Asia: Thailand, Philippines, Malaysia, Hong Kong, and Macao, 397-422. Dumas, M., La Rosa, M., Mendling, J., & Rejers, H. (2013). Fundamentals of business process management. New York;Berlin;: Springer. Gehrke, N., & Mueller-Wickop, N. (2010). Basic Principles of Financial Process Mining A Journey through Financial Data in Accounting Information Systems. In AMCIS (pp. 289). Gottschalk, F., Wagemakers, T. A., Jansen-Vullers, M. H., van der Aalst, W. M., & La Rosa, M. (2009). Configurable process models: Experiences from a municipality case study. In International Conference on Advanced Information Systems Engineering (pp. 486-500): Springer. Hammer, M. (1990). Reengineering work: don't automate, obliterate. Harvard business review, 68(4), 104-112. Hevner, A., March, S. T., Park, J., & Ram, S. (2004). Design science in information systems research. MIS quarterly, 28(1), 75-105. Irshad, H., Shafiq, B., Vaidya, J., Bashir, M. A., Shamail, S., & Adam, N. (2015, 2015). Preserving privacy in collaborative Business Process Composition. In (Vol. 4, pp. 112123): SCITEPRESS.

Janssen, M., Charalabidis, Y., & Zuiderwijk, A. (2012). Benefits, adoption barriers and myths of open data and open government. Information Systems Management, 29(4), 258-268. Kuechler, B., & Vaishnavi, V. (Singer-songwriters). (2004). Design Science Research in Information Systems. On: Jan. Lindell, & Pinkas. (2002). Privacy Preserving Data Mining. [journal article]. Journal of Cryptology, 15(3), 177-206. Retrieved from http://dx.doi.org/10.1007/s00145-001-00192. doi:10.1007/s00145-001-0019-2 Malina, L., & Hajny, J. (2013). Efficient security solution for privacy-preserving cloud services. In Telecommunications and Signal Processing (TSP), 2013 36th International Conference on (pp. 23-27): IEEE. Mans, R., Schonenberg, H., Leonardi, G., Panzarasa, S., Cavallini, A., Quaglini, S., & van der AALST, W. (2008). Process mining techniques: an application to stroke care. Studies in health technology and informatics, 136, 573. Mans, R. S., van der Aalst, W., & Vanwersch, R. J. (2015). Process Mining in Healthcare: Evaluating and Exploiting Operational Healthcare Processes: Springer. Matwin, S. (2013). Privacy-preserving data mining techniques: survey and challenges. In Discrimination and Privacy in the Information Society (pp. 209-221): Springer. Mukkamala, R., & Ashok, V. G. (2011). Fuzzy-based methods for privacy-preserving data mining. In Information Technology: New Generations (ITNG), 2011 Eighth International Conference on (pp. 348-353): IEEE. OSSS. (2016). Certified Lean Six Sigma Green Belt Book. Peffers, K., Tuunanen, T., Rothenberger, M. A., & Chatterjee, S. (2007). A design science research methodology for information systems research. Journal of management information systems, 24(3), 45-77. PMG. (2016). Process Mining. Retrieved from http://www.processmining.org/ Pourmasoumi, A., & Bagheri, E. (2016). Business Process Mining. arXiv preprint arXiv:1607.00607. Qi, X., & Zong, M. (2012). An overview of privacy preserving data mining. Procedia Environmental Sciences, 12, 1341-1347. Raju, R., Komalavalli, R., & Kesavakumar, V. (2009). Privacy maintenance collaborative data mining-a practical approach. In 2009 Second International Conference on Emerging Trends in Engineering & Technology (pp. 307-311): IEEE.

Rosa, M. L. (Singer-songwriter). (2016). Marcello La Rosa shares best practices of building Business Process Architecture. On Best Practices of Building Business Process Management. Retrieved from: http://bpmtips.com/marcello-la-rosa-shares-best-practicesof-building-business-process-architecture/ Rozinat, A., de Jong, I. S., Günther, C. W., & van der Aalst, W. M. (2009). Process mining applied to the test process of wafer scanners in ASML. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 39(4), 474-479. Sinha, H. J. A. P., & Vitharana, P. Service-Oriented Perspectives in Design Science Research. Suriadi, S., Wynn, M. T., Ouyang, C., ter Hofstede, A. H., & van Dijk, N. J. (2013). Understanding process behaviours in a large insurance company in Australia: A case study. In International Conference on Advanced Information Systems Engineering (pp. 449-464): Springer. van Aalst, W. M., van Hee, K. M., van Werf, J. M., & Verdonk, M. (2010). Auditing 2.0: using process mining to support tomorrow's auditor. Computer, 43(3), 90-93. Retrieved from http://ieeexplore.ieee.org/ielx5/2/5427360/05427384.pdf?tp=&arnumber=5427384&isnu mber=5427360. van der Aalst, W. (2016). Process Mining: Data Science in Action: Springer. Van Der Aalst, W., Adriansyah, A., De Medeiros, A. K. A., Arcieri, F., Baier, T., Blickle, T., . . . Buijs, J. (2011). Process mining manifesto. In International Conference on Business Process Management (pp. 169-194): Springer. van der Aalst, W. M., Reijers, H. A., Weijters, A. J., van Dongen, B. F., De Medeiros, A. A., Song, M., & Verbeek, H. (2007). Business process mining: An industrial application. Information Systems, 32(5), 713-732. Retrieved from http://ac.elscdn.com/S0306437906000305/1-s2.0-S0306437906000305-main.pdf?_tid=3de4b496667b-11e6-a02c00000aab0f02&acdnat=1471659198_b18d2181cf325997263fb8940641561c. Vatsalan, D., Christen, P., & Verykios, V. S. (2013). A taxonomy of privacy-preserving record linkage techniques. Information Systems, 38(6), 946-969.

Lihat lebih banyak...

Privacy-Preserving Techniques for Process Mining

Descrição do Produto

Comentários