What ever happened to the book search of tomorrow?
May 17, 2017 10:58 PM   Subscribe

How Google Book Search Got Lost - Google Books was the company’s first moonshot. But 15 years later, the project is stuck in low-Earth orbit.
posted by Chrysostom (17 comments total) 23 users marked this as a favorite
 
Still, the bulk of the work at Google Books continues to be on “search quality” — making sure that you find the Kafka passage you need, fast. It’s an unglamorous game of inches — less moonshot and more, say, satellite maintenance.

Yeah, whatever they've done with this in the last few years has utterly failed (at least for me). As someone who uses Google Books a ton, it's become obvious that they selectively hide results (that are not even in copyright and have full view enabled) in order to fulfill some unknowable and unconfigurable relevance heuristic. Which is incredibly frustrating, when you saw a particular passage yesterday that has now vanished.

Sorry for the rant.
posted by dilaudid at 11:22 PM on May 17, 2017 [12 favorites]


Google is too busy indexing us to worry about books anymore.
posted by fairmettle at 1:29 AM on May 18, 2017 [6 favorites]


Torching the Modern-Day Library of Alexandria is also worth reading, an Atlantic piece that ran about a week after the Backchannel one. Also some commentary on both pieces from Alexander Macgillivray (a lawyer who worked on Google Books), Mike Furlough, and James Grimmelman.
posted by Nelson at 1:38 AM on May 18, 2017 [8 favorites]


Comment 4 is unavailable in this preview
posted by Devonian at 2:34 AM on May 18, 2017 [27 favorites]


I saw that Atlantic article a few weeks ago, and it drove me crazy. Referring to legal action against Google as "torching the modern-day Library of Alexandria" is absurdly hyperbolic. They weren't trying to make a library, they were trying to make a digital bookstore built on a reckless business model that stood to violate copyright law on a massive scale.

I appreciate Mike Furlough's perspective on this. To be honest, I've never given too much thought to what HathiTrust actually is. I've gotten a lot of material through HathiTrust, most of which was scanned by Google (which I know because Google puts an obnoxious watermark on every page that says "digitized by Google"). I'm not fundamentally opposed to scanning books, but I'd rather it be a consortium of libraries instead of centralized database controlled by one entity. Especially when Google Books

[Paragraphs 3 to 5 are not shown in this preview.]

For what it's worth, my university's library sends like 6,000 books a month to Google to scan. There's a whole team devoted solely to paging and reshelving books for Google.
posted by shapes that haunt the dusk at 2:49 AM on May 18, 2017 [10 favorites]


Google books seems to be the only place to find Chateaubriand's complete memoirs in English without spending a few thousand dollars, so that is cool. At the same time, it would have taken me a few minutes, as opposed to a few hours a few different times, if they didn't selectively show results and cut off the full title of books (most old books having longass titles in thee olde type style) so that one couldn't search by volume number or even see the volume number until getting into the actual text of the book.
posted by bootlegpop at 3:29 AM on May 18, 2017 [2 favorites]


> I'm not fundamentally opposed to scanning books, but I'd rather it be a consortium of libraries instead of centralized database controlled by one entity.

The perfect is the enemy of the good. I have plenty of problems with Google Books and have ranted about them in the past, but for God's sake, Google Books and its associated search function have radically transformed my life and the lives of everyone who works with printed material. I can understand wishing it were better done, but I can't understand wishing it didn't exist unless it were done according to one's ideal of a perfect world.
posted by languagehat at 6:45 AM on May 18, 2017 [15 favorites]


But how will Chateaubriand have any incentive to keep producing valuable copyrighted works if he can't get a royalty share of that few thousand dollars?
posted by Nelson at 6:54 AM on May 18, 2017 [3 favorites]


Maybe it's a hobby he subsidizes with his steak royalties.
posted by Chrysostom at 7:03 AM on May 18, 2017 [2 favorites]


One thing I've noticed in the last few years is that Google Books appears to be able to deal with non-Latin scripts now. So, there's that.

A subject I wish the OP article had touched on is that I've come across a great many instances of books which are clearly in the public domain because they were published in the U.S. before 1923 which suddenly become inaccessible. Has anyone ever seen one of those come back?

My suspicion has always been that companies re-printing books will obtain an ISBN for them or something and then file fake copyright takedown requests. Or maybe even, if the takedown request process is as easy as it is for Youtube for example, antique book dealers who merely have a single copy for sale will do that in a bid to create artificial scarcity.

In any case I was very glad to start seeing HathiTrust books show up in search results; hopefully they won't put up with crap like that.

Wikisource, a project affiliated with Wikipedia with the mission of creating online but not-necessarily-scanned versions of books and other source texts, has a list of sites where such texts can be found like the ones mentioned in the article and also universities and libraries around the world which are digitizing parts of their collections.
posted by XMLicious at 7:23 AM on May 18, 2017 [2 favorites]


GoogleBooks, archive.org, and HathiTrust are all essential for my research--I mean, I thanked the scanners in the acknowledgments of my last book--because very few individual libraries have an interest in acquiring Victorian religious fiction for, er, some reason. But oh, the frustrations of GoogleBooks are many:

1) The aforementioned yanking of books out of full view. Just last month, I went to double-check a quotation and discovered that GoogleBooks had shifted the copy to unviewable status. It's...a mid-19th c. novel? It's...not in copyright? And this doesn't seem to have anything to do with the scrapers who put out (frequently useless) POD copies. To make matters worse, this was not one of those instances where a Google-scanned copy, inaccessible at Google itself, could be found at archive.org or HathiTrust. (And what's up with that, exactly?)

2) Disabling library search. I have several thousand books in my various libraries; once upon a time, I could actually search them. No longer.

3) Searching the books site via the main Google page often works better than searching in GoogleBooks itself, which strikes me as what you might call counter-intuitive.

4) Search results themselves frequently make no sense. Like many Victorianists, I frequently find myself working with deadly triple-deckers. But I have to get GoogleBooks to pull them up first. Searches often result in only one volume of a novel, and in some cases, it's impossible to convince the engine to return the others. Most common result:

a) I've got a hit!
b) It's for volume III.
c) Yo, I need volumes I and II. Where are they?
d) Well, let's check "other editions" on the About page.
e) Still more volume III. Come on...
f) *various keyword permutations follow*
g) Finally, it's volumes I and II! Jeez...
posted by thomas j wise at 8:35 AM on May 18, 2017 [10 favorites]


I can understand wishing it were better done, but I can't understand wishing it didn't exist unless it were done according to one's ideal of a perfect world.

That's not what I was trying to say at all. I use material scanned by Google every day, but I access it through HathiTrust, whose goals are fundamentally different from Google's. I never said I wish Google Books didn't exist, I'm saying we already have something like my ideal consortium of libraries, and I'd rather have that expanded than have one centralized database administered by an ad company. In other words, I'd like libraries to have a bigger role than they already do. Nobody can scan as much as Google has, but I don't care who scans the books, I care who provides access to them.
posted by shapes that haunt the dusk at 11:15 AM on May 18, 2017 [2 favorites]


Somewhat related: "Never trust a corporation to do a library's job" (by MeFi's own Andy Baio)
posted by mhum at 3:37 PM on May 18, 2017 [1 favorite]


It's a lot worse for people outside the USA. I can understand that conflicting copyright regimes are a problem to deal with, but (a), that is literally your job, and (b), books get blocked even when there is literally no possibility of them being in copyright, anywhere. And it seems totally arbitrary, too - I can sometimes access books that are currently copyrighted, and the next moment be blocked from, oh, some legal text from the 1800s.
posted by Joe in Australia at 5:34 PM on May 18, 2017 [2 favorites]


So... why, exactly, should Google keep scanning books and letting people search them? Why make them easier to search? Why continue to invest in this at all?
posted by dougfelt at 8:08 PM on May 18, 2017 [1 favorite]


thomas j wise: 1) The aforementioned yanking of books out of full view. Just last month, I went to double-check a quotation and discovered that GoogleBooks had shifted the copy to unviewable status. It's...a mid-19th c. novel? It's...not in copyright? And this doesn't seem to have anything to do with the scrapers who put out (frequently useless) POD copies. To make matters worse, this was not one of those instances where a Google-scanned copy, inaccessible at Google itself, could be found at archive.org or HathiTrust. (And what's up with that, exactly?)

I had a similar problem a few weeks ago and sent a query to Google Books asking for a book to be opened up. A couple of days later they replied and the book was put in full view. It's not ideal that I had to go through the process to see text from 1918, but this was better than nothing. In general I was pleasantly surprised that Google had people answering these kinds of queries.
posted by Kattullus at 3:15 AM on May 19, 2017 [5 favorites]


I tried that about eight years ago (or was it eight million?) and never heard back, so that's an improvement. Or perhaps my message or its reply just went astray on that occasion.

Google Books MeTa post
I made in the long long ago.
posted by XMLicious at 10:50 AM on May 19, 2017 [1 favorite]


« Older Mice get artificial ovaries, sheep get artificial...   |   Natasha, Pierre & the Great Comet of 1812 Newer »


This thread has been archived and is closed to new comments