I was just talking to my wife about something like this the other day (I don't mind skipping over things, but she would prefer if it was automatic). We mostly do audiobooks, so my current thought is to:
1. Transcribe book for time stamps (simple using whisper models for English). Time per book depends on the CPU and resolution on time stamps (word timing, phrase timing, sentence timing), I can transcribe a 60 hour audiobook in a little under an hour.
2. Determine the prompt to feed to LLM model, feed every so many pages of text (probably like 3000 words or something) with the prompt about exactly what to filter, how it is a subset of a book, and get locations of when/if things happen. I would also have it summarize anything plot relevant for things it says to skip (the worst part is when scenes are plot relevant). My current thought is types of scenes rather specific filter word lists, but word filtering should not be too difficult either.
3. Map word locations back to time stamps in the audio file.
4. Encode the audio file, skipping parts that were flagged.
I may sit down and play with this in the next few weeks. If I get it working, I would be willing to post the code on GitHub, but it would be for personal use with your own files only (I am not touching any copyright stuff with a 10-foot poll as an individual). I'm also in the boat of "don't edit the authors voice" and stuff, and some books will be more difficult to determine like Wayne.
The other issue I have is that it is a lot harder to recommend books without worrying about what is in the book if you don't remember what was filtered during a conversation. There have been enough times where I recommend a book/show/movie that has scenes I forgot about and the person I recommended it to did not appreciate that...
I could maybe look into modifying the script to run against an ebook, which just changes the ingest and editing steps, but not sure if I'll get to that because epub format is really annoying to work with.