Challenges of protecting confidentiality in social media data and their ethical import

Rossi, Arianna; Arenas, Monica P.; Kocyigit, Emre; HANI, Moad

doi:10.1109/EuroSPW55150.2022.00066

Download

Article (Scientific journals)

Challenges of protecting confidentiality in social media data and their ethical import

Rossi, Arianna; Arenas, Monica P.; Kocyigit, Emre et al.

2022 • In 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), (The 1st International Workshop on Ethics in Computer Security (EthiCS 2022)), p. 554-561

Peer reviewed

Permalink
https://hdl.handle.net/20.500.12907/44959

DOI
10.1109/EuroSPW55150.2022.00066

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Challenges of Protecting Confidentiality in Social Media Data and Their Ethical Import.pdf

Author postprint (133.04 kB)

Creative Commons License - Public Domain Dedication

[en] This article discussed the challenges of pseudonymizing unstructured, noisy social media data for cybersecurity research purposes and presents an open- source package developed to pseudonymize personal and confidential information (i.e., personal names, companies, and locations) contained in such data. Its goal is to facilitate compliance with EU data protection obligations and the upholding of research ethics principles like the respect for the autonomy, privacy and dignity of research participants, the social responsibility of researchers, and scientific integrity. We discuss the limitations of the pseudonymizer package, their ethical import, and the additional security measures that should be adopted to protect the confidentiality of the data.

Download

All documents in ORBi UMONS are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

GDPR compliance; Named Entity Recognition; Pseudonymization; research ethics; security measures; Cyber security; Named entity recognition; Open source package; Personal and confidential informations; Research ethics; Research purpose; Security measure; Social media datum; Computer Networks and Communications; Hardware and Architecture; Information Systems; Information Systems and Management; Safety, Risk, Reliability and Quality

Abstract :

[en] This article discusses the challenges of pseudonymizing unstructured, noisy social media data for cybersecurity research purposes and presents an open-source package developed to pseudonymize personal and confidential information (i.e., personal names, companies, and locations) contained in such data. Its goal is to facilitate compliance with EU data protection obligations and the upholding of research ethics principles like the respect for the autonomy, privacy and dignity of research participants, the social responsibility of researchers, and scientific integrity. We discuss the limitations of the pseudonymizer package, their ethical import, and the additional security measures that should be adopted to protect the confidentiality of the data.

Precision for document type :

Review article

Disciplines :

Computer science

Author, co-author :

Rossi, Arianna; SnT, University of Luxembourg, Luxembourg, Luxembourg

Arenas, Monica P.; SnT, University of Luxembourg, Luxembourg, Luxembourg

Kocyigit, Emre; SnT, University of Luxembourg, Luxembourg, Luxembourg

HANI, Moad ; Université de Mons - UMONS

Language :

English

Title :

Challenges of protecting confidentiality in social media data and their ethical import

Original title :

[en] Challenges of protecting confidentiality in social media data and their ethical import

Publication date :

10 May 2022

Journal title :

2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

Publisher :

University of Genoa, CINI - Consorzio Interuniversitario Nazionale per l'Informatica, Genoa, Italy

Special issue title :

Conference: 2022 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

Issue :

The 1st International Workshop on Ethics in Computer Security (EthiCS 2022)

Pages :

554-561

Peer reviewed :

Peer reviewed

Development Goals :

10. Reduced inequalities

Additional URL :

https://ieeexplore.ieee.org/document/9799350

Research institute :

Infortech

Funders :

FNR - Fonds National de la Recherche

Funding number :

IS/14717072; PoC20 / 15299666 / NOFAKES-PoC

Funding text :

This work has been partially supported by the Luxembourg National Research Fund (FNR): “Deceptive Patterns Online (Decepticon)” IS/14717072 and No more Fakes “NOFAKES” PoC20 / 15299666 / NOFAKES-PoC.

Commentary :

"We developed a Pseudonymizer Python package that works on English textual data and released it under a GPL v2 license . This library works with structured and unstructured data, but in the case of unstructured data, and especially highly noisy data such as social media data, the challenge is greater and thus the performance is knowingly less accurate. This software has three independent functionalities applied to different kinds of data: Companies, Geolocations, and Personal Names."

Available on ORBi UMONS :

since 16 January 2023

Statistics

Number of views

43 (5 by UMONS)

Number of downloads

121 (4 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

M. Bellare, A. Boldyreva, and A. O'Neill. Deterministic and Efficiently Searchable Encryption. In A. Menezes, editor, Advances in Cryptology-CRYPTO 2007, pages 535-552, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
D. Boneh, G. Di Crescenzo, R. Ostrovsky, and G. Persiano. Public Key Encryption with Keyword Search. In C. Cachin and J. L. Camenisch, editors, Advances in Cryptology-EUROCRYPT 2004, pages 506-522, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg.
Z. Brakerski and G. Segev. Better Security for Deterministic Public-Key Encryption: The Auxiliary-Input Setting. Journal of Cryptology, 27 (2): 210-247, apr 2014.
M. Clark. The facts on news reports about facebook data. https: //about. fb. com/news/2021/04/ facts-on-news-reports-about-facebook-data/, Apr 2021.
L. Derczynski, E. Nichols, M. van Erp, and N. Limsopatham. Results of the wnut2017 shared task on novel and emerging entity recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text, page 140-147. Association for Computational Linguistics, 2017.
EtaLab IA. Guide à la pseudonymization decisions ce. https: //github. com/etalab-ia/pseudonymisation decisions ce, Jan 2020.
N. Fernandes, M. Dras, and A. McIver. Generalised Differential Privacy for Text Document Processing, volume 11426 of Lecture Notes in Computer Science, page 123-148. Springer International Publishing, 2019.
C. Fiesler, N. Beard, and B. C. Keegan. No robots, spiders, or scrapers: Legal and ethical regulation of data collection methods in social media terms of service. Proceedings of the International AAAI Conference on Web and Social Media, 14: 187-196, May 2020.
C. Fiesler and N. Proferes. "participant" perceptions of twitter research ethics. Social Media + Society, 4 (1): 2056305118763366, Jan 2018.
A. S. Franzke, A. Bechmann, M. Zimmer, and C. M. Ess. Internet research: Ethical guidelines 3. 0: Association of internet researchers, 2019.
J. Fu, P. Liu, and G. Neubig. Interpretable multi-dataset evaluation for named entity recognition. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6058-6069, 2020.
C. M. Gray, Y. Kou, B. Battles, J. Hoggatt, and A. L. Toombs. The dark (patterns) side of ux design. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems-CHI '18, page 1-14, Montreal QC, Canada, 2018. ACM Press.
B. Hutchinson, A. Smart, A. Hanna, E. Denton, C. Greer, O. Kjartansson, P. Barnes, and M. Mitchell. Towards accountability for machine learning datasets: Practices from software engineering and infrastructure. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, page 560-575. ACM, Mar 2021.
Ireland Data Protection Commission. Guidance on anonymisation and pseudonymisation, Jun 2019.
M. Jensen, C. Lauradoux, and K. Limniotis. Pseudonymisation techniques and best practices. Recommendations on shaping technology according to data protection and privacy provisions. European Union Agency for Cybersecurity (ENISA), November 2019. DOI 10. 2824/247711.
S. Ji, P. Mittal, and R. Beyah. Graph data anonymization, de-anonymization attacks, and de-anonymizability quantification: A survey. IEEE Communications Surveys & Tutorials, 19 (2): 1305-1326, 2016.
L. K. Kaye, C. Hewson, T. Buchanan, N. Coulsoun, Branley-Bell, C. Fullwodd, and L. Devlin. Ethics Guidelines for Internetmediated Research. The British Psychological Society, 2021.
R. P. Khandpur, T. Ji, S. Jan, G. Wang, C.-T. Lu, and N. Ramakrishnan. Crowdsourcing cybersecurity: Cyber attack detection using social media. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, page 1049-1057, Singapore Singapore, Nov 2017. ACM.
M. Lablans, A. Borg, and F. Ückert. A restful interface to pseudonymization services in modern web applications. BMC Medical Informatics and Decision Making, 15 (1): 2, Feb 2015.
N. Marres and E. Weltevrede. Scraping the social issues in live social research. Journal of Cultural Economy, 6 (3): 313-335, Aug 2013.
MISP. Information sharing and cooperation enabled by gdpr. https: //www. misp-project. org/compliance/GDPR/, Jan 2018.
J. Oates, D. Carpenter, M. Fisher, S. Goodson, B. Hannah, R. Kwiatkowski, K. Prutton, D. Reeves, and T. Wainwright. BPS Code of Human Research Ethics. The British Psychological Society, Apr 2021. ISBN 978-1-85433-792-4.
A. W. Party. Opinion 05/2014 on anonymisation techniques, 2014.
J. Peters. Personal data of 533 million facebook users leaks online. https: //www. Theverge. com/2021/4/4/22366822/ facebook-personal-data-533-million-leaks-online-email-phonenumbers, Apr 2021.
N. Proferes. Information flow solipsism in an exploratory study of beliefs about twitter. Social Media + Society, 3 (1): 2056305117698493, Jan 2017.
A. Rossi, A. Kumari, and G. Lenzini. Unwinding a Legal and Ethical Ariadne's Thread out of the Twitter's Scraping Maze. Springer Nature, Venice, sebastien ziegler, adrian quesada rodriguez and stefan schiffner edition, In press.
W. Stallings. Operating system security (Chapter 24), pages 24. 1-24. 21. Wiley, 6 edition, 2014.
L. Townsend and C. Wallace. Chapter 8: The Ethics of Using Social Media Data in Research: A New Framework, volume 2, page 189-207. Emerald Publishing Limited, Dec 2017.
E. van der Walt, J. H. P. Eloff, and J. Grobler. Cyber-security: Identity deception detection on social media platforms. Computers & Security, 78: 76-89, Sep 2018.
J. Vitak, N. Proferes, K. Shilton, and Z. Ashktorab. Ethics regulation in social computing research: Examining the role of institutional review boards. Journal of Empirical Research on Human Research Ethics, 12 (5): 372-382, Dec 2017.
M. L. Williams, P. Burnap, L. Sloan, C. Jessop, and H. Lepps. Users' Views of Ethics in Social Media Research: Informed Consent, Anonymity, and Harm, volume 2, page 27-52. Emerald Publishing Limited, Dec 2017.
S. Zong, A. Ritter, G. Mueller, and E. Wright. Analyzing the perceived severity of cybersecurity threats reported on social media. In Proceedings of NAACL-HLT, page 1380-1390, Minneapolis, Minnesota, USA, 2019. Association for Computational Linguistics.