In the current age of hacking and whistleblowing, the internet contains massive troves of leaked information. These complex datasets can be goldmines of revelations in the public interest— if you know how to access and analyze them. For investigative journalists, hacktivists, and amateur researchers alike, this book provides the technical expertise needed to find and transform unintelligible files into groundbreaking reports.
Guided by renowned investigative journalist and infosec expert Micah Lee, who helped secure Edward Snowden’s communications with the press, youʼll learn the tools, technologies, and programming basics needed to crack open and interrogate datasets freely available on the internet or your own private datasets obtained directly from sources. Each chapter features hands-on exercises using real hacked data from governments, companies, and political groups, as well as interesting nuggets from datasets that never made it into published stories. You’ll dig into hacked files from the BlueLeaks law enforcement records, analyze …
In the current age of hacking and whistleblowing, the internet contains massive troves of leaked information. These complex datasets can be goldmines of revelations in the public interest— if you know how to access and analyze them. For investigative journalists, hacktivists, and amateur researchers alike, this book provides the technical expertise needed to find and transform unintelligible files into groundbreaking reports.
Guided by renowned investigative journalist and infosec expert Micah Lee, who helped secure Edward Snowden’s communications with the press, youʼll learn the tools, technologies, and programming basics needed to crack open and interrogate datasets freely available on the internet or your own private datasets obtained directly from sources. Each chapter features hands-on exercises using real hacked data from governments, companies, and political groups, as well as interesting nuggets from datasets that never made it into published stories. You’ll dig into hacked files from the BlueLeaks law enforcement records, analyze social-media traffic related to the 2021 attack on the U.S. Capitol, and get the exclusive story of privately leaked data from anti-vaccine group America’s Frontline Doctors.
Along the way, you’ll learn:
How to secure and authenticate datasets and safely communicate with sources
Python programming basics needed for data science investigations
Security concepts, like disk encryption
How to work with data in EML, MBOX, JSON, CSV, and SQL formats
Tricks for using the command-line interface to explore datasets packed with secrets
A fantastic resource for those analyzing leaked data.
5 stars
This book covers the technology and tradecraft a reporter would need to safely handle leaked data. I've worked in this area since 2009 and this is hands down the best resource I've ever seen - a comprehensive exposition on the methods with lots of easy to follow hands on exercises. I already knew about 80% of what Lee suggests, the things I personally found most valuable were the introduction to Docker Compose, as well as the Aleph and Dangerzone tools. I run a Substack where I publish on similar matters, I'm used to writing for those who've newly developed an urge to dig deeper, and Lee does an excellent job of picking starting points accessible to all, then building on them.