Data De-Identification can be defined as the process of preventing an individual’s personal details (name, address, phone number, social security numbers) from being released to third parties. For instance, data produced by human subject research may be de-anonymized to preserve the confidentiality of individual research participants. Companies involved in human subject research will often use data de-identification to minimize the risk of personal information leaking to the public. For instance, pharmaceutical companies may use data de-identification to ensure the accuracy of their drug trials. Similarly, hospitals and healthcare facilities may use data de-identification to avoid releasing identifying details to patients.
Data De-Identification Methods
Data De-Identification has two main methods: static data de-identification and dynamic data masking. The former involves using meta-data to determine the identity and location of a person. This method works well for people with simple names or relatively static locations. A person who moves to a new state will quickly notice that his name or address no longer appears on his old health records. The method works well for the purpose of matching a person’s location with his current address, but it does not work well for people with complex histories, birth dates, and social security numbers.
Dynamic Data Masking is a more recently emerging form of data de-identification. This technique is used to identify healthcare fraud and other illegal activities by healthcare professionals. Originally developed to reduce the risk of unauthorized access to patient medical records by doctors and other professionals who work in the hospital environment, this technology has expanded into the private sector. Data masks are now used in conjunction with HIPAA security measures, which aims to restrict the amount of information that medical professionals can obtain about a patient.
Data De-Identification with Social Security Numbers. Some data de-identification techniques require the user to provide contact information that identifies them. This information usually comes from credit card applications, electronic Medicare enrollment, telephone numbers, social security numbers, driver’s license numbers, and employee identification numbers. Though it may sound convenient for patients to be given only their social security numbers when they apply for healthcare, many people do not want the possibility of having their social security numbers made available to anyone.
In my opinion, there are Most Important Types of Generalizing: Stimulus generalization, response generalization, maintenance and Data Generalization. Data Generalization is the process of data summarizing by putting relatively low level values with higher level concepts. The Data Generalization Guide is indeed a must have book for all data scientists, analysts, programmers, decision makers, and other people involved in statistical or economic research.
Healthcare Data Security
Data security is a hot topic, both in the public and private sectors. Data security experts debate whether patient data sets should be kept confidential and free from prying eyes. They also argue about the legalities and consequences of keeping confidential patient data sets out of the reach of the government. The controversy over privacy and data security continues to rage, and data de-identification techniques are playing a role in the sideshow.
Data De-Identification with EHR
EHR ( Electronic Health Record) software applications are made up of various components that facilitate data de-identification. A medical records database, for instance, can be used to automatically de-identify certain parts of a patient’s history. Similarly, historical patient information can be extracted from databases, and the software can also create documents that outline how care plans should be written. By using these applications, healthcare providers can create, manage, and access all the personal data of their patients. Data security regulations have been implemented in order to provide adequate protection to personal health information.
Data Masking
Data masking is a technique used in order to de-anonymize certain portions of a large database or collection. Data masking involves generating an anonymous copy of a piece of data sets that will effectively render it unidentifiable. Examples of data sets that could be masked include names and demographic information. This technique can be used together with data de-identification to further protect the identity of individuals. Data masking helps reduce the likelihood of data de-anonymization, while at the same time reducing the risk of abuse by unscrupulous organizations.