Sunday, July 21, 2019

Privacy-handling Techniques and Algorithms for Data Mining

Privacy-handling Techniques and Algorithms for Data Mining VIVEK UNIYAL ABSTRACT Data mining can extract a previously unknown patterns from vast collection of data. Nowadays networking, hardware and software technology are rapidly growing outstanding in collection of data amount. Organization are containing huge amount of data from many heterogeneous database in which private and sensitive information of an individual. In data mining novel pattern will be extracted from such data by which we can use for various domains in decision marketing. But in the data mining output there will be sensitive, private or personal information of a particular person can also be revealed. There will be some misuse of finding these types of information, and it can harm the data owner. So in distributed environment privacy is becoming an important issue in many applications of data mining. Techniques of Privacy preserving data mining (PPDM) are provide new direction to solve issues. By PPDM, we can find a valid data mining results without underlying data values learning. In this dissertation we have introduced two algorithms for privacy handling concern. One is k-anonymization in which information corresponding to any individual person in a release data cannot be distinguished from that of at least k-1 other individual persons whose information also appears in release data. In this algorithm we are achieving the k-anonimyzation some values must be suppressed or generalized in database. K-anonymity have record linkage attack mode and l-diversity can have attack mode of attribute linkage. KEYWORDS: Data Mining, Advantages and Disadvantages of Data Mining, Privacy handking, K-anonymization Algorithm, L-diversity. ACKNOWLEDGEMENTS I wish to take this opportunity to express my deep gratitude to all the people who have extended their cooperation in various ways during my dissertation. It is my pleasure to acknowledge the help of all those individuals. First of all, I would like to express my deepest gratitude to my dissertation supervisor, Mr. Govind Kamboj without whom none of this would have been possible. He provided me always the essential direction and advice during the work. I am grateful to him to give a shape towards completion of my dissertation. Without his supervision and support, this work would not have been completed successfully in time. I am grateful to the President, Vice President, Chancellor, Vice Chancellor and Head of the Department of the Graphic Era University for providing an excellent environment for work with ample facilities and academic freedom. I would also like to thank the teaching and non-teaching staff for their valuable support during M.Tech. Last but not the least; I am grateful to all my teachers and friends for their cooperation and encouragement throughout completing this task. (Vivek Uniyal) M.Tech( Computer Science Engineering) TABLE OF CONTENTS CANDIDATES DECLERATION iii ABSTRACT iv ACKNOWLEDGEMENT v LIST OF ABBREVIATIONS ix LIST OF FIGURES x 1. INTRODUCTION 1 1.1 Problem Statement 1 1.2 Overview 1 1.3 Advantages of data mining 3 1.4 Disadvantages of data mining 4 1.5 Why privacy-handling is required in data-mining 4 1.6 Motivation 6 1.7 Organization 4 2. BACKGROUND AND LITERATURE SURVEY 7 3. METHODS AND METHODOLOGIES 13 3.1 Randomization method 13 3.2 Group based anonymization methods 14 3.2.1 K-Anonymity framework 14 3.2.2 Personalized privacy-preservation 15 3.2.3 Utility based privacy-preservation 15 3.2.4 Sequential releases 15 3.2.5 The l-diversity method 15 3.3 Distributed privacy-preserving data mining 16 3.4 Detailed description about K-anonymity and l-diversity 16 3.4.1 Data collection and Data publishing 16 3.4.2 Privacy Data publishing 17 3.4.3 Algorithm of k-anonimity 19 3.4.4 l-diversity 24 3.4.1.1 Lack of diversity 25 3.4.1.2 Strong background knowledge 25 4. EXPERIMENTAL RESULT 27 4.1 Introduction 27 4.2 Experimental result 27 4.2.1 Result of proposed k-anonymity and l-diversity 27 5. CONCLUSION AND SCOPE FOR FUTURE WORK 33 5.1 Conclusion 33 5.2 Scope for Future Work 33 PUBLICATION OUT OF THIS WORK 34 REFERENCES 35 LIST OF ABBREVIATIONS PPDP Privacy-preserving data publishing PPDMPrivacy-preserving data mining QID Quasi-Identifier LIST OF FIGURES Figure 1.1: Data mining a step included in the process of knowledge discovery 1 Figure 1.2 Typical data mining system architecture 2 Figure 1.3: Record Owner, Data Collection and Data Publishing 17 Figure 1.4: Hospital Database 18 Figure 1.5 Taxonomy tree for JOB, SEX, AGE (QID attributes) 20 Figure 1.6 Hospital table Original record in data base 21 Figure 1.7 Table of Sensitive record (Publishing data) 21 Figure 1.8 Table of External Data ppt table 22 Figure 1.9 Resulting data after linking the sensitive and ppl table 22 Figure 1.10 Research table (generalized with k-anonymous published data) 23 Figure 1.11 Extended table (For linking like generalized voter list) 23 Figure 1.12 For checking the k- anonymity 23 Figure 1.13 Result of linking the table research to extended 24 Figure 1.14 Hospital original data record Project 28 Figure 1.15 Comparing the Un-Generalized published and extended data tables 29 Figure 1.16 Comparing Generalized Extended and Sensitive table records 30 Figure 1.17 Table for k-anonymity and l-diversity 32 Figure 1.18 Plotting exact l-value and distinct l-diversity value in weka 33 Figure 1.19 Plotting exact l-value and entropy l-diversity value in weka 33

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.