BookAngel Idea

Treamayne · August 17, 2023

20 minutes ago, Lego Mistborn said:

I would disagree. Sure if an explicit scene is coming you might be able to notice and skip, but how do you know how far to skip? And what if it comes up on you and you didn't notice? Or cuss words that pop up randomly? I think those make this project a valid idea.

You may be able to mitigate legal risk by making this a Calibre Plug-in. Calibre already supports heuristic processing, and similar plugins exist (example). If you are only providig a "filter" for somebody to apply to their own purchased property - that may change the legal ramifications. Your plug-in could, theoretically, just reach out to grab the "filter data when applied to an ebook in the library - launch the Calibre Converter - and using the tools and filters make a "<title>-edit" epub (or chosen format) version that has been "cleaned."

Also, you amy want to look at Sites like these that already have an extensive database of book content for self/family-censors:

Spoiler

While I am significantly anti-censorship; I am much more pro-ownership. People should have the right to make content they have purchased fit their personal tastes and choices (which is why all of my ebooks are DRM stripped and Calibre managed so that I can change things I do not like - like the Table of Contents in Warbreaker - or these projects).

Edema Rue · August 17, 2023

1 hour ago, Lego Mistborn said:

Lawyer! @Edema Ruh Can you use your debate research skills to find out whether blurring text would be a copyright infringement. We'll want to look at U.S. law, don't worry about the states or other nations' laws yet.

I think I’m the least knowledgeable person here, but I absolutely can. Life did just get crazy so I might need like a week, but it’ll happen eventually.

Experience · August 17, 2023

1 hour ago, Treamayne said:

You may be able to mitigate legal risk by making this a Calibre Plug-in. Calibre already supports heuristic processing, and similar plugins exist (example). If you are only providig a "filter" for somebody to apply to their own purchased property - that may change the legal ramifications. Your plug-in could, theoretically, just reach out to grab the "filter data when applied to an ebook in the library - launch the Calibre Converter - and using the tools and filters make a "<title>-edit" epub (or chosen format) version that has been "cleaned."

Also, you amy want to look at Sites like these that already have an extensive database of book content for self/family-censors:

NPR.org

Common Sense Media

Book Cave

Hide contents

While I am significantly anti-censorship; I am much more pro-ownership. People should have the right to make content they have purchased fit their personal tastes and choices (which is why all of my ebooks are DRM stripped and Calibre managed so that I can change things I do not like - like the Table of Contents in Warbreaker - or these projects).

But do those websites actually change the book, or just tell you what is in it? I know common sense media is very detailed about telling what it includes, but I'm pretty sure that's all they do?

Treamayne · August 17, 2023

On 8/17/2023 at 9:44 AM, Experience said:

But do those websites actually change the book, or just tell you what is in it? I know common sense media is very detailed about telling what it includes, but I'm pretty sure that's all they do?

Correct, the references were in regards to the part of the discussion on which topics are in which books (since it was mentioned previously about who would read and annotate areas that may want filters). At the most basic (and easily implemented level) if a book is known to have profanity - the tool (assuming an ebook solution, and assuming a conversion process that edits-out or changes the content for personal copies) would use a search-replace (probably heuristic or regex to get word variants) to "filter" that content.

The rest of the utility on accessing the content of those sites would be, for example, looking at the review of <Book> and seeing what content it has already been flagged as having. You might then query the Entertainment section (of this forum) to see if anybody has read that book and is willing to tell you chapter/page/section numbers that contain the content to be filtered (maybe even keywords to set a before-and-after filter ends). Alternatively, if the project were to become large enough, you might be able to ask one or more of those companies if an API can be used to access their database directly to "flag" the filterable content for those books.

Mostly they were just examples to aid in brainstorming possible courses of action (COA)s and solutions.

For example, if the question was "what content might need filters in the J. D. Robb ". . . In Death" series;" I would be able to say that I have read all the entire series (novels and novellas) and can attest to which topics might meet the filter requirements.

Note: (Spoilered for content)

Spoiler

Depending on how strict a person's personal preference is, this series should probably be avoided. Deeper than the language (which has both normal and fantasy-replacement slang and slurs) or the police-procedural level of violence. As a homocide detective, the damage done to victims can be rather thoroughly described - even if the "on-screen" violence can sometimes be very "Hollywood." Deeper even than that is the backstory of the main characters, which can be very upsetting for many readers (details only in PM if somebody wants spoiler-level detail on the kind of content in the backstories)

Hope that helps

Edited December 4, 2023 by Treamayne
SPAG

dannnex · August 17, 2023

Quote

P.S. @dannnnnnnnnnnnnnnnnnnex are you studying CS or do you just a have a working understanding of programming, and what does that knowledge amount to?

yeah i'm majoring in Software Engineering

still at the very beginning but im pretty fluent in python

Littlefish2967 · December 4, 2023

Did this ever get anywhere? I really want it to be a thing, and would be happy to pitch in to help. The purpose of such a website is because I don't want to read the scenes that need to be taken out. Quite the conundrum. And most AI platforms won't edit something if its explicit content. I think.

Experience · December 4, 2023

9 minutes ago, Littlefish2967 said:

Did this ever get anywhere? I really want it to be a thing, and would be happy to pitch in to help. The purpose of such a website is because I don't want to read the scenes that need to be taken out. Quite the conundrum. And most AI platforms won't edit something if its explicit content. I think.

We never really got far. I just don't have the know how for the necessary coding.

Littlefish2967 · December 4, 2023

14 minutes ago, Littlefish2967 said:

Ok. Just curious. Thank you!

TimmyT · May 18, 2024

Following this topic. I'd be willing to download and pay for a subscription for a service like this. It would be nice to take the few scenes/words that are in something and either switch the word with an adjective that isn't a curse word or just delete NSFW type scenes from a book altogether.

Following topic! I'd be interested in buying a subscription/downloading an ebook app for my family. It would be great if it had the capabilities of removing/ replacing certain content from ebooks with different adjective / less NSFW type stuff. It most cases, it's not needed in the book at all to be able to tell the story. If it is needed...this isn't the service for you anyways because you're reading topics that you clearly want that kind of stuff.

Angelina · September 21, 2024

Would it be useful for both audiobooks and digital version of books such as kindle? You could just blur out the parts of the book you don't want to read.

Duxredux · September 21, 2024

Interesting. Hadn't seen this before. I've thought about how to do something like this for a while, but unfortunately it's very difficult to do without someone personally going through and reading the before and after.

I don't know much of the legalese, but I've relatively recently learned Python and a bit of AI coding with NLPs, though nothing too extensive. I can definitely say that modifying audio is waaay more challenging than editing text. For anyone that has ever tried to use voice commands on their phone, it's pretty obvious that we aren't at 100% accuracy for speech recognition. False positives for audio recognition could render huge portions of books unintelligible. What if the heroes in the book decide to head off to Hoover Dam? What if you want to listen to a podcast hosted by Brandon and Dan?

Beyond that, content filtering sounds simple enough, but in practice it can be extremely difficult because of how nuanced language is. It's a classification task, to correctly identify language that should be filtered and to leave language that does not need to and has both false negatives and false positives. Innuendo is a likely culprit for false negatives because the words themselves are benign - unless used in that specific phrasing. Wayne for example would have several lines that would totally fly under the radar but would make Marasi blush. False positives are cases where the language could be explicit but isn't in the specific usage. This isn't meant to be innuendo, but I don't have better examples:

Spoiler

Sentences that may be flagged depending on how you design your filtering system:

John stood at the grill, barbecuing the hotdogs and chicken breasts.
The pirates hauled the treasure chest from the bottom of the ocean.
The janitorial crew stripped the old wax off of the floor down to the bare tile before preparing to reapply the sealant.
"Television was better in the old days, none of this sleaze," grumbled Grandpa. "We used to watch Perry Mason and Dick Van Dyke and that was good enough for us."

If you don't see anything possibly wrong with those sentences, there isn't meant to be anything wrong with them. AI may flag it based on keyword anyway.

Beyond this, it can be difficult based on subject matter. A scientific article on mating habits of the red herring could very well get flagged. An article or section trying to raise awareness and speak out against date rape likely would get flagged even though the purpose of such an article is to raise morality. There's also cases when the word itself is used not as a curse, but as a valid descriptor - for example alternative names for Braise or the Threnodite forest that is Silence Montane's home.

Even after false positives, it can be very difficult to create filters for content that would/should be filtered, but is plot relevant. For more extreme cases where replacing keywords won't reduce the nature of the subject matter, someone might have to summarize the scene and compare it with the original to make sure that minimal content is lost for comprehension. Trusting the AI to do that is very risky.

Here's a comic that might illustrate why altering language and retaining comprehension can be difficult.

Separate from all of this complexity is that each reader will have different tolerances for what they would want filtered. Some may be totally fine with the violence of Steel Inquisitors but not Siri and Susebron spending time as a legally married couple and vice versa. It's why VidAngel gives a breakdown for the specific content that the viewer wants filtered.

I've thought a bit more since my initial post, and with some time I could write a program to flag potentially explicit text in an EPUB format or something similar, but for all of the reasons stated it would still need someone personally validating the changes. Even then, catching innuendo would be hard enough that pretty much all of the Wax and Wayne books would likely involve just flagging everything with Wayne. For a book I hadn't read, it would take someone who has read it to personally tell me if there were Wayne-esque characters or key scenes that should be filtered.

Edited September 23, 2024 by Duxredux
added thought

nyxholas · November 27, 2024

I was just talking to my wife about something like this the other day (I don't mind skipping over things, but she would prefer if it was automatic). We mostly do audiobooks, so my current thought is to:

1. Transcribe book for time stamps (simple using whisper models for English). Time per book depends on the CPU and resolution on time stamps (word timing, phrase timing, sentence timing), I can transcribe a 60 hour audiobook in a little under an hour.

2. Determine the prompt to feed to LLM model, feed every so many pages of text (probably like 3000 words or something) with the prompt about exactly what to filter, how it is a subset of a book, and get locations of when/if things happen. I would also have it summarize anything plot relevant for things it says to skip (the worst part is when scenes are plot relevant). My current thought is types of scenes rather specific filter word lists, but word filtering should not be too difficult either.

3. Map word locations back to time stamps in the audio file.

4. Encode the audio file, skipping parts that were flagged.

I may sit down and play with this in the next few weeks. If I get it working, I would be willing to post the code on GitHub, but it would be for personal use with your own files only (I am not touching any copyright stuff with a 10-foot poll as an individual). I'm also in the boat of "don't edit the authors voice" and stuff, and some books will be more difficult to determine like Wayne.

The other issue I have is that it is a lot harder to recommend books without worrying about what is in the book if you don't remember what was filtered during a conversation. There have been enough times where I recommend a book/show/movie that has scenes I forgot about and the person I recommended it to did not appreciate that...

I could maybe look into modifying the script to run against an ebook, which just changes the ingest and editing steps, but not sure if I'll get to that because epub format is really annoying to work with.

avid_reader · May 10, 2025

This already exists for Audible audiobooks. It's called Siftbooks.com. You can submit a book request and their app will review it for profanity and sexually explicit content within something like 24-48 hours.

Sign In

Recent episodes

WTCC submissions

Other areas

Site info

Links

BookAngel Idea

Recommended Posts

Treamayne

Edema Rue she/her

Experience he/him

Treamayne

dannnex male

Littlefish2967

Experience he/him

Littlefish2967

TimmyT

Angelina

Duxredux he/him

nyxholas

avid_reader

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Links

More