Making 10M government PDF documents searchable – FlowingData

Government organizations love to distribute documents as PDF files. They are easy to forward and to print. The problem is when you want to find and access them later among millions of other files. GovScape, a research project between the University of Washington and Boston University, provides a search interface through the End of Term Web Archive’s 2020 crawl.

The code for GovScape is open source and available on GitHub. I have a feeling such a tool will grow more important going forward.

Source link

Latest

Buttery Cheddar Pecan Crackers – Joy the Baker

Welcome, friends, officially to the holiday season. The big...

From WMG’s Suno deal to Spotify’s reported US price hike plans… it’s MBW’s weekly round-up

Welcome to Music Business Worldwide’s Weekly Round-up –...

Newsletter

spot_img

Don't miss

Buttery Cheddar Pecan Crackers – Joy the Baker

Welcome, friends, officially to the holiday season. The big...

From WMG’s Suno deal to Spotify’s reported US price hike plans… it’s MBW’s weekly round-up

Welcome to Music Business Worldwide’s Weekly Round-up –...

Ferrari Design. Creative Journeys 2010-2025

The exhibition hosted at the Turin MAUTO (Museo Nazionale...
spot_imgspot_img

Scientists Agree That Everyone Hates Your Terrible Zoom Mic

If your mic sucks on a conference call...

Australia news live: climate activists stop third coal ship at Port of Newcastle; pilot dies after mid-air plane collision in Sydney | Australia news

Climate activists stop third coal ship at Port of NewcastleJordyn BeazleyClimate activists have stopped a third coal ship from entering the Port of Newcastle...

Buttery Cheddar Pecan Crackers – Joy the Baker

Welcome, friends, officially to the holiday season. The big old house in Bellville, Texas is settling into its own traditions this year, and I...