
Anna’s Archive Claims Massive Spotify Music Scrape
In a move that could represent the largest music piracy operation in history, shadow library Anna’s Archive announced it has scraped 86 million audio files from Spotify—representing 99.6% of content people actually listen to on the platform. The group claims to have backed up virtually all of Spotify’s relevant music catalog, amounting to approximately 300 terabytes of data distributed through bulk torrents. This unprecedented data extraction reveals surprising insights about streaming habits and platform content distribution.
Unprecedented Scale and Technical Details
The shadow library’s operation represents a staggering scale of data collection. According to their claims, they captured metadata for 99% of Spotify’s 256 million tracks, including 186 million unique International Standard Recording Codes (ISRCs). For comparison, MusicBrainz, the largest legal open music database, contains only about 5 million entries—making Anna’s Archive’s collection approximately 37 times larger.
Preservation Methodology and Quality
The group employed sophisticated prioritization methods, using Spotify’s own popularity metrics to determine what content to preserve first. Popular tracks were saved in their original OGG Vorbis format at 160 kilobits per second to prevent quality loss, while less popular content was compressed to OGG Opus at 75 kbps to conserve storage space. This approach reveals the group’s focus on practical preservation rather than comprehensive archiving.
Revealing Streaming Platform Realities
The data analysis uncovered surprising truths about Spotify’s catalog. Over 70% of Spotify’s 256 million tracks have a popularity score of exactly zero, meaning nobody listens to them. Only about 0.1% of songs—roughly 210,000 tracks—have popularity scores of 50 or higher, yet these account for the vast majority of all listening activity. The top three songs alone have more total plays than the bottom 20 to 100 million songs combined.
Genre Distribution and Audio Characteristics
Electronic/Dance emerged as the largest genre category by artist count with 520,075 artists, followed by Rock (370,179) and World/Traditional (202,529). Audio analysis revealed that most tracks cluster around 120 BPM with normal distribution, have low “speechiness” and “instrumentalness” scores, and predominantly use C major and G major keys. Approximately 13.5% of all tracks are tagged as explicit content.
Preservation vs. Piracy: The Ethical Debate
Anna’s Archive frames their operation as cultural preservation rather than piracy, stating they’re building “a music archive primarily aimed at preservation.” They argue that existing archiving efforts focus too heavily on popular artists and audiophile formats, leaving obscure music vulnerable to disappearance if platforms change policies or shut down. The decentralized torrent distribution creates redundancy that can’t be eliminated by any single entity.
Legal Implications and Industry Response
Spotify has acknowledged the breach, with a spokesperson stating that “a third party scraped public metadata and used illicit tactics to circumvent DRM to access some of the platform’s audio files.” The company carefully noted “some” audio files rather than confirming the 86 million figure. Anna’s Archive already faces significant legal pressure, with Belgium issuing blocking orders carrying fines up to €500,000, UK High Court blocks in December 2024, and Germany’s major ISPs blocking the site’s main domains in October 2025.
Industry Impact and Future Consequences
The financial implications for artists are significant, as Spotify pays artists between $0.003 and $0.005 per stream. According to industry calculators, 1 million reproductions would yield approximately $4,370 in royalties—revenue that disappears with free torrent distribution. The legal response is expected to be severe, with the music industry likely to pursue aggressive litigation given the scale of the infringement.
Currently, only metadata has been fully released, with audio files rolling out gradually through bulk torrents starting with the most popular tracks. The data is already distributed across thousands of torrent nodes worldwide, making complete eradication nearly impossible. This distribution method represents both the strength of the archive’s preservation model and the challenge for rights holders seeking to protect their intellectual property in the digital age.




