Data Scraping: How Much is Too Much?

Almost two years ago, I wrote about LinkedIn’s suit against hiQ Labs, Inc. In that case, LinkedIn sued hiQ Labs for scraping its users’ public profiles and selling the results as part of an employee training and retention tool. There, the Court found that hiQ Labs violated the social media company’s terms of service because, as it states very clearly in LinkedIn’s user agreement, “NO SCRAPING.” (I’m paraphrasing, loudly.)

We now have a second court decision ruling against scraping — but for a very different reason than in the hiQ action. 

This time, the venue is the 11th Circuit Court of Appeals and it’s that court’s second decision in the case since the dispute began in 2016. In its first decision (back in 2020) the 11th Circuit wrote: “Warning: This gets pretty dense (and difficult) pretty quickly.” That’s true! But don’t be scared. I think we can summarize it all succinctly without getting lost. 

The plaintiff is Compulife Software, Inc., whose products are a database and software that allows licensees (generally, insurance agents) to compare life insurance quotes. These agents/licensees can incorporate Compulife’s products into their websites, but the public can also access Compulife’s products on its own site, www.term4sale.com. 

The defendants are a group of individuals who used bots to scrape Compulife’s publicly-accessible site and database and built their own, competing insurance quote site. This group (they never actually formed a business entity) obtained the source code for Compulife’s software under false pretenses. (One of the group’s members contacted Compulife, claiming that he worked for one of Compulife’s licensees, and asked for a copy of the source code. Compulife gave it to him.) The defendants’ used this code to engineer the scraping of Compulife’s website.

Based on this, Compulife accused the defendants of violating the federal Defend Trade Secrets Act, as well as the analogous Florida Uniform Trade Secrets Act. (There were also copyright infringement claims relating to defendants’ unauthorized use of Compulife’s software, but that’s for another day). To prevail on either claim, Compulife had to establish that (1) it had a trade secret, and (2) the defendants misappropriated Compulife’s trade secret. 

Initially, the District Court held that Compulife didn’t have a protectable trade secret because its entire database could be accessed by the public. However, in its 2020 decision, the Appeals Court reversed this, concluding the database was indeed a trade secret because, among other things, Compulife “goes to great lengths to secure its database” and that even though the individual, publicly-available quotes on the Compulife site were not trade secrets, Compulife’s compilation of them could be. 

On this latest appeal, the main issue was whether the defendants’ use of bots to scrape Compulife’s database was misappropriation. The 11th Circuit, in addition to reaffirming its original holding that Compulife’s database was a trade secret, concluded that defendants misappropriated that secret when they used bots to “commit a scraping attack that acquired millions of variable-dependent insurance quotes.” That quantity was a key factor: As the Court wrote, “even if individual quotes that are publicly available lack trade secret status, the whole compilation of them (which would be nearly impossible for a human to obtain through the website without scraping) can still be a trade secret,” and the defendants’ use of bots to do what a human could not manually accomplish represented improper means.

The Appeals Court, however, was careful not to condemn scraping as a whole, writing “[i]t is important to note that scraping and related technologies (like crawling) may be perfectly legitimate.” (Italics from the court’s opinion).

This seems pretty straightforward particularly given defendants’ acquisition of Compulife’s code under false pretenses. However, I’m curious to see future rulings that shed more light on when scraping is legitimate and, more importantly, what factors do courts look at to determine when scraping is ok and when it’s not? Is it the sheer volume of material taken? The impact on the plaintiff’s business? Something else?

When the 11th Circuit (or another court) enlightens us, I’m sure I’ll be back to write about it.