Kurkistan Posted January 15, 2012 Report Share Posted January 15, 2012 (edited) Nice progress! It works really well for me. Also, version two is up on the first post. Thank you both for the catch on that character missing the line. Oh ya, don't worry, version two doesn't change anything that would make your trans-literator incompatible. EDIT: Also, I noticed that line breaks are removed when the tool is run. Is there a way to keep the line breaks? And I was testing some of your new conversions. Seems to have a glitch: Erase. Erasing. Chase. Chasing. Blindly. Ice. Ices. becomes: .eys .eysing .eys .eysing .blindly .ahys .ahysz As far as I've seen, it doesn't take away line breaks. Could you link in a .txt and it's output for me to test? I'm in the middle of making it compatible with -ed, -ly, -ish, and -able endings, as well as hunting down some fugitive c's. I'll try to get to your bug by the end of the night. Thanks for bringing it to my attention. EDIT: Thanks for posting a new version, although I would suggest renaming the upload-version to just AlethiTS, since we want it to be compatible for the documents and posts of people who upgrade. Edited January 16, 2012 by Kurkistan Link to comment Share on other sites More sharing options...
Turos Posted January 15, 2012 Author Report Share Posted January 15, 2012 As far as I've seen, it doesn't take away line breaks. Could you link in a .txt and it's output for me to test? Oh, weird, it doesn't take out the line breaks when I open it in Notepad++, but does in Notepad. That's crazy. Maybe it's just my computer. EDIT: Thanks for posting a new version, although I would suggest renaming the upload-version to just AlethiTS, since we want it to be compatible for the documents and posts of people who upgrade. I only changed the name of the .zip file. The actual font is still AlethiTS.ttf. The additional font is AlethiTS_lined.ttf. It just adds another font for those who wants the line in space characters. And both fonts have the line through the period character. Take a look at both and see what I mean. If you want the lined spaces by default, I can change that. Link to comment Share on other sites More sharing options...
Kurkistan Posted January 16, 2012 Report Share Posted January 16, 2012 Oh, weird, it doesn't take out the line breaks when I open it in Notepad++, but does in Notepad. That's crazy. Maybe it's just my computer. I only changed the name of the .zip file. The actual font is still AlethiTS.ttf. The additional font is AlethiTS_lined.ttf. It just adds another font for those who wants the line in space characters. And both fonts have the line through the period character. Take a look at both and see what I mean. If you want the lined spaces by default, I can change that. *Opens .doc with Alethi script at 72* *Screen explodes* Ah, I see what you did there. I assumed that "AlethiTS" was simply the old version. My mistake. I prefer no lines between words, and that's how it is in the notebook pages, so I think it's better to leave it as optional. *Goes back to work* Link to comment Share on other sites More sharing options...
Kurkistan Posted January 16, 2012 Report Share Posted January 16, 2012 (edited) I've reached the conclusion that we deserve medals for this. No rush, but it would be nice. Substantial reordering of C category to simplify debugging, substantial number of grammars added to C category, various bugs squashed, -ed, -ly, -ish, -able, and -er suffixes added, timer added to show how long the transliteration took, various touch-ups. As always, but particularly with so many changes, comments are welcome. EDIT 2: added tests for y\n, not sure if I might want to put them in replace(). EDIT: Just ran the Odyssey again, 8 minutes, 7 seconds with no misses on 'c' except for names and weird compounds (washingcistern, panicstricken). /** * Goal: Provide an easy means of transliterating Roman letters into Alethi script using Turos's font conventions. * * * @author Kurkistan * @date 01/15/2012 * @version 1.7.4.1 */ import java.io.FileReader; import java.io.FileWriter; import java.io.BufferedWriter; import java.io.InputStreamReader; import java.io.File; import java.io.PrintWriter; import java.io.IOException; import java.util.Scanner; import java.io.BufferedReader; import java.util.Arrays; public class AlethiTransliterator_1_7_4_1 { static boolean debug_char = true; static boolean debug_end_e = false; public static void main (String[] arg) throws IOException{ Scanner input=new Scanner(System.in); System.out.print("Enter input file (full name of file in same directory): "); String temp = input.next(); //temp = "Test.txt"; final double startTime = System.currentTimeMillis(); final double endTime; try { String alethi = convertText(temp); if(alethi.equals("&")) return; temp = "Alethi_"+temp; writeFile(alethi,temp); if(debug_char){ String violations = allowedCharacters(alethi); //debugging blatant errors if(!violations.equals("")) System.out.println("Unauthorized sections in text (Line:Violation):"+"\n"+violations); } } finally { endTime = System.currentTimeMillis(); } final double duration = endTime - startTime; System.out.println("Execution time: "+(duration/1000)+" seconds"); } private static String convertText(String roman) throws IOException { char[] body = readFile(roman); if((body.length==1)&&(body[0]=='&')) //invalid input, halt program return "&"; periodMover(body); String alethi = replaceLetters(body); return alethi; } /** * Load a text file contents as a <code>String<code>. * * @param file The input file * @return The file contents as a <code>String</code> * @exception IOException IO Error */ private static char[] readFile(String file) throws IOException { String whole = ""; try { BufferedReader in = new BufferedReader(new FileReader(file)); String str; while ((str = in.readLine()) != null) { whole = whole + str + '\n'; //process(str); } in.close(); } catch (IOException e) { System.out.println("Invalid file path"); return "&".toCharArray(); } whole="\n"+whole.toLowerCase(); //convert to lower - keeping an extra \n at the end and beginning for replacement ease of use, will get rid of it return whole.toCharArray(); } private static void writeFile(String text, String destination) throws IOException { File file = new File(destination); boolean exist = file.createNewFile(); if (!exist) { System.out.println("Output file already exists."); System.exit(0); } else { FileWriter fstream = new FileWriter(destination); BufferedWriter out = new BufferedWriter(fstream); out.write(text); out.close(); System.out.println("File created successfully."); } } private static String allowedCharacters(String body) { //c, q, w, x, th, sh, ch - Forbidden; I assume no lowercases of the special characters (C, X) //\n, ' ', '.', C, S/s, T/t, X, - Allowed char[] library = new char[29]; String[] pairs = {"th","sh","ch"}; //These shouldn't trigger unless I made a serious mistake in the "necessary" section. char[] body_array = body.toCharArray(); String violations = ""; int line = 1; //for all of those +1ers out there int target_size = 2; int search = body.length() - target_size; for(int j = 0;j<pairs.length;j++) for(int i = 0; i<=search;i++) if(body_array=='\n') line++; else if(body.substring(i,i+target_size).equals(pairs[j])) violations = violations + (line+":"+pairs[j]) + "; "; library[0] = '\n'; library[1] = ' '; library[2] = '.'; library[3] = 'C'; library[4] = 'S'; library[5] = 'T'; library[6] = 'X'; int place = 7; for(int i = 97; i <=122; i++){ if((i!=99)&&(i!=113)&&(i!=119)&&(i!=120)){ //c, q, w, and x library[place] = (char)i; place++; } } line = 1; //resetting for(int i = 0;i<body.length();i++) if(body_array=='\n') line++; else if(Arrays.binarySearch(library,body_array)<0) //not in library violations = violations + (line+":"+body_array) + "; "; return violations; } /** * In the Alethi alphabet, sentences start with a period '.' and don't end with anything. */ private static void periodMover(char[] array) { int temp = 0; for(int i=0;i<array.length;i++) { if(array=='.'){ if(!(((array.length - i) >= 3)&&(array==array[i+1])&&(array[i+1]==array[i+2]))) //ellipsis { twistRight(array,temp,i); i++; while(i<array.length) if(!inAlphabet(array)) i++; else break; //Yes, the cardinal sin. temp=i; } else if(((array.length-i)>=3)&&(array==array[i+1])&&(array[i+1]==array[i+2])) for(int j=0;j<3;j++) twistRight(array,temp+j,i+j); } else if(array=='\n') temp=i+1; //Doesn't allow sentences to continue after true line breaks. Enables no-period headers and whatnot. } } private static boolean inAlphabet(char character) { char[] library = new char[26]; int place = 0; for(int i = 97; i <=122; i++){ library[place] = (char)i; place++; } if(Arrays.binarySearch(library,character)>=0) //I felt embarrassed by my earlier search algorithm. return true; return false; } private static void twistRight(char[] array, int start, int end) { if (start==end) return; char a = array[start]; char b; array[start] = array[end]; //'.', although this is generalized while(start!=end) { start++; b = array[start]; array[start] = a; a = b; } } /** * Special charaters: For t, use lower case t. For th, use capital T. For s, use lower case s. For sh, use capital S. For ch, use c. X will print a combination of k and s. For q and w, use your imagination. Technically speaking, q is a combination of k and u. W is basically a combination of a long u ("oo") and any other vowel: a e i o and short u ("uh") */ private static String replaceLetters(char[] array) { String body = new String(array); //Ease of use //1.3.5-Threw in an If statement in the replace function to deal with space and \n at the same time //ph body = replace(body,"ph","f"); //E at end - Some interference possible with C's body = replace(body,"use\n","yooz\n"); body = replace(body,"used\n","yoozd\n"); //special case //Note: Need to make sure that plurals of e-enders are covered, i.e. wives. body = replace(body,"like\n","lahyk\n"); body = replace(body,"ole\n","ohl\n"); //hyperbole will suffer body = replace(body,"ose\n","ohz\n"); body = replace(body,"ame\n","eym\n"); body = replace(body,"ese\n","eez\n"); body = replace(body,"ave\n","eyv\n"); body = replace(body,"eive\n","eev\n"); body = replace(body,"vive\n","vahyv\n"); body = replace(body,"ive\n","iv\n"); body = replace(body,"eve\n","eev\n"); body = replace(body,"ile\n","hyl\n"); body = replace(body,"gle\n","guhl\n"); body = replace(body,"base\n","beys\n"); //And now the ends-with function on scrabblefinder.com was useful body = replace(body,"case\n","ceys\n"); //Don't need to allow for c->k if c's are bellow body = replace(body,"chase\n","Ceys\n"); //ch == C body = replace(body,"erase\n","ihreys\n"); body = replace(body,"ase\n","eez\n"); body = replace(body,"olve\n","olv\n"); body = replace(body,"alve\n","ahv\n"); body = replace(body,"elve\n","elv\n"); body = replace(body,"some\n","suhm\n"); body = replace(body,"come\n","cuhm\n"); //Need to move this up body = replace(body,"ome\n","ohm\n"); body = replace(body,"vate\n","vit\n"); body = replace(body,"ate\n","eyt\n"); body = replace(body,"tle\n","l\n"); //This is what dictionary.com said to do, and I live to serve body = replace(body,"ine\n","ahyn\n"); body = replace(body,".one\n",".uhn\n"); body = replace(body,"done\n","duhn\n"); body = replace(body,"none\n","nuhn\n"); body = replace(body,"one\n","ohn\n"); body = replace(body,"ake\n","eyk\n"); body = replace(body,"ope\n","ohp\n"); String[] temp = {"en","st","un","c","f","g","s","t",""}; body = replace(body,"ctable\n","kteybuhl\n"); //save the c's! for(int i = 0; i<temp.length;i++) if(temp.equals("c")) body = replace(body,"kable\n","eybuhl\n"); else body = replace(body,temp+"able\n","eybuhl\n"); body = replace(body,"able\n","uhbuhl\n"); //This one is either "eybuhl" for a few short words or "uhbuhl" for all others body = replace(body,"rue\n","roo\n"); body = replace(body,"ide\n","ahyd\n"); body = replace(body,"ife\n","ahyf\n"); body = replace(body,"ade\n","eyd\n"); //ere - their vs there body = replace(body,"ere\n","eir\n"); //ore, as in fore, bore body = replace(body,"ore","ohr"); body = replace(body,".are\n",".ahr\n"); body = replace(body,"are\n","air\n"); body = replace(body,"oke\n","ohk\n"); body = replace(body,"aire\n","air\n"); body = replace(body,"ire\n","yuhr\n"); //? body = replace(body,"ype\n","ahyp\n"); body = replace(body,"urge\n","urj\n"); body = replace(body,"erge\n","urj\n"); //Not a mistake body = replace(body,"arge\n","hrj\n"); body = replace(body,"orge\n","wrj\n"); body = replace(body,"ime\n","ahym\n"); body = replace(body,"ble\n","buhl\n"); body = replace(body,"sle\n","ahyl\n"); body = replace(body,"promise\n","promis\n"); body = replace(body,"aise\n","eyz\n"); body = replace(body,"ise\n","ahyz\n"); body = replace(body,"lse\n","ls\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"sce\n","es\n"); //gue - This one was irritating, might not be right body = replace(body,"logue\n","awg\n"); body = replace(body,"gogue\n","awg\n"); body = replace(body,".morgue\n",".mawrg\n"); body = replace(body,".fugue\n",".fyoog\n"); body = replace(body,".segue\n",".segwey\n"); body = replace(body,"gue\n","eeg\n"); //-nge body = replace(body,"nge\n","nj\n"); //problem with sing vs singe not really being separable at the gerund-testing level body = replace(body,"sinjing\n","singing\n"); //comprehensive fix for gerund mishaps body = replace(body,"slinjing\n","slinging\n"); body = replace(body,"strinjing\n","stringing\n"); body = replace(body,"swinjing\n","swinging\n"); body = replace(body,"brinjing\n","bringing\n"); body = replace(body,"flinjing\n","flinging\n"); body = replace(body,"prinjing\n","pringing\n"); body = replace(body,".winjing\n",".winging\n"); body = replace(body,".zinjing\n",".zinging\n"); body = replace(body,".dinjing\n",".dinging\n"); body = replace(body,".pinjing\n",".pinging\n"); //END E's //s at end body = replace(body,"es\n","ez\n"); //Needs to go before c->s conversion, since C's are all soft S's //This is a big thing. I moved the c down mainly to allow for the s->z convertor to do it's job, and the judgement on whether or not this messes things up is pending. //START C 1.7 - moved so that higher number of characters in target get's preference, blocks kept cohesive //Stolen from the "necessary" bin. body = replace(body,"ch","C"); //Although both versions of C work, I'm assuming capitalized, so no lowercas c's are allowed in the text body = replace(body,"accent","aksent"); body = replace(body,"exercise\n","eksersahyz\n"); body = replace(body,".once",".wuhns"); body = replace(body,"ucca","uhka"); body = replace(body,"ucco","uhko"); body = replace(body,"uccu","uhku"); body = replace(body,".occ",".uhk"); body = replace(body,"ucce","uhkse"); body = replace(body,"ucci","uhksi"); body = replace(body,"occup","okyuh"); //very special case body = replace(body,"occa","uhkah"); body = replace(body,"occi","oksi"); body = replace(body,"occe","ochee"); //? body = replace(body,"occo","okuh"); body = replace(body,"occu","okuh"); //Just went down the list on http://www.morewords.com/contains/cc - Useful, if laborious body = replace(body,".acc",".aks"); //the dreaded double c's body = replace(body,".ecc",".eks"); body = replace(body,".scie",".sahye"); //For Science! body = replace(body,"sciou","shuh"); //For Conscience! body = replace(body,"cious","shuhs"); //For Ithaca! body = replace(body,"scio","shuh"); body = replace(body,"scie","shuh"); body = replace(body,"cies","seez"); //prophocies body = replace(body,"ciez","seez"); //s->z already done body = replace(body,"acen","eysuhn"); //Don't get complacent body = replace(body,"ician","ishuhn"); //musician body = replace(body,"cism","sizuhm"); //anglicanism body = replace(body,"icise\n","uhsahyz\n"); body = replace(body,"rcise\n","ruhsahyz\n"); body = replace(body,"ciate\n","sheeeyt\n"); body = replace(body,"cision\n","sizhuhn\n"); body = replace(body,"cise\n","sahys\n"); body = replace(body,"cist\n","sist"); body = replace(body,"uce\n","us\n"); body = replace(body,"uces\n","usez\n"); //z incorporated body = replace(body,"uced\n","usst\n"); //D's body = replace(body,"cial","shul"); body = replace(body,".acq",".akw"); //might need refinement body = replace(body,"cque","ke"); body = replace(body,"acquaint","uhkweynt"); body = replace(body,"cing","sing"); body = replace(body,"came\n","keym\n"); body = replace(body,"came","kamuh"); //1.6.5 - odyssey test body = replace(body,"exce","ikse"); body = replace(body,"excit","iksahyt"); body = replace(body,"excis","eksahyz"); body = replace(body,".acid\n",".asid\n"); body = replace(body,".aci",".uhsi"); body = replace(body,"ence","ens"); body = replace(body,"ierce\n","eers\n"); //body = replace(body,".ance",".ahns"); body = replace(body,".trance",".trahns"); body = replace(body,"dance\n","dahns\n"); body = replace(body,"Cance","Cahns"); body = replace(body,"cance","cahns"); body = replace(body,"lance","lahns"); body = replace(body,"ance\n","uhns\n"); body = replace(body,"ici","isi"); //Sicily body = replace(body,"iec","ees"); //Piece/Peace -> Pees body = replace(body,"eac","ees"); body = replace(body,"ice","ahys"); //Long S. body = replace(body,"cep","sep"); body = replace(body,"cin","sin"); body = replace(body,".cit",".sit"); body = replace(body,"cip","sip"); body = replace(body,"cif","suhf"); body = replace(body,"ces","seez"); body = replace(body,"cez\n","seez\n"); //Incase of S->Z body = replace(body,"ce\n","s\n"); body = replace(body,"icc","ik"); body = replace(body,"icn","ikn"); body = replace(body,"sce","se"); body = replace(body,"sci","si"); body = replace(body,"scy","sahy"); //body = replace(body,"sco","sko"); body = replace(body,"cea","sea"); body = replace(body,"nci","nsi"); //might need refinement body = replace(body,"ncy","nsee"); body = replace(body,"cei","see"); body = replace(body,"cee","see"); body = replace(body,"cent","sent"); //odyssey //starting with c body = replace(body,".cy",".sahy"); body = replace(body,".cir",".sur"); body = replace(body,".cid",".sahyd"); body = replace(body,".ci",".si"); body = replace(body,"ace","eys"); body = replace(body,".cer",".sur"); body = replace(body,".ce",".se"); body = replace(body,"ck","k"); body = replace(body,"sc","sk"); body = replace(body,"cy","see"); //1.4.3 - si->see body = replace(body,"ci\n","sahy\n"); body = replace(body,"ce","se"); body = replace(body,"ca","ka"); body = replace(body,"co","ko"); body = replace(body,"cu","ku"); body = replace(body,"ct","kt"); body = replace(body,"cl","kl"); body = replace(body,"cr","kr"); body = replace(body,".c",".k"); //This can possibly leave lowercase c's in the text, although I think that all properly spelled words should be covered here. body = replace(body,"c\n","k\n"); //END C'S //Not sure where to put this section //ss body = replace(body,"ss","s"); //wh body = replace(body,"wha","ua"); body = replace(body,"whe","ue"); body = replace(body,"whi","ui"); body = replace(body,"whu","uu"); body = replace(body,"who\n","hu\n"); //gh body = replace(body,"gha","gah"); //This section need work body = replace(body,"gho","goh"); body = replace(body,"ought","awt"); body = replace(body,"though","thoh"); body = replace(body,"bough","bou"); body = replace(body,"cough","kof"); body = replace(body,"gh\n","\n"); body = replace(body,"gh","g"); //to, too, two - Just a quick patch for those three words, not a general solution to any problem I can see body = replace(body,".to\n",".too\n"); body = replace(body,".two\n",".too\n"); //q at end body = replace(body,"q\n","k\n"); //w at end body = replace(body,"ow\n","au\n"); //.sy body = replace(body,".syr",".suhr"); //Moved up to e-enders body = replace(body,".syr",".sir"); body = replace(body,".sly",".slahy"); body = replace(body,".lying\n",".lahying\n"); body = replace(body,".ly",".li"); body = realReplace("qqq",body,"ay\n","ey\n"); //stopgap, might want to revisit //body = replace(body,"ey\n","ey\n"); body = realReplace("qqq",body,"oy\n","oi\n"); body = realReplace("qqq",body,"uy\n","ahy\n"); body = realReplace("qqq",body,"y\n","ee\n"); //might need generalized in replace() //sz->siz - The coward's way out. I need to sit down and make this thing more cohesive body = replace(body,"sz\n","siz\n"); //The annoying part is the hodge-podgeness of English. The only workable rout may be just to demand phonetic spelling in cases like "Tow" //Necessary --Moved down to make ease-of-use conversions easier body = replace(body,"th","T"); body = replace(body,"sh","S"); //body = replace(body,"ch","C"); //took some liberties here, capitalized the C to make room for the c->k/s conversion body = replace(body,"x","X"); //Consistency - x is really a compound character of ks. body = replace(body,"q","ku"); body = replace(body,"wa","ua"); body = replace(body,"we","ue"); body = replace(body,"wi","ui"); body = replace(body,"wo","uo"); body = replace(body,"wu","uu"); body = replace(body,"w","u"); //exception catcher if(debug_end_e){ body = replace(body,"e\n","Q\n"); //Just for debugging body = replace(body,".TQ",".Te"); body = replace(body,".bQ",".be"); body = replace(body,".seQ",".seee"); body = replace(body,".mQ",".me"); } return body.substring(1,body.length()-1); //clipping first/last '\n' } private static String replace(String body, String target, String sub){ return realReplace("",body,target,sub); } private static String realReplace(String sofar, String body, String target, String sub) { int target_size = target.length(); int sub_size = sub.length(); //'.'==' ' // if(target.startsWith(".")) // System.out.println(target); if(target.startsWith(".")){ body = replace(body,(" "+target.substring(1,target_size)),(" "+sub.substring(1,sub_size))); body = replace(body,("\n"+target.substring(1,target_size)),("\n"+sub.substring(1,sub_size))); } if(target.endsWith("\n")){ //checks for spaces and for plurals, also does s->z conversion where necessary body = replace(body,(target.substring(0,target_size-1)+" "),(sub.substring(0,sub_size-1)+" ")); //space substitution if(sofar.length()<=2){ //that took longer than it should have. Anyone who can suggest improvements is welcome to try. if((!sofar.contains("z"))&&(!sofar.contains("l"))){ //I think contains() covers it. It saves time over endsWith() if it stops unnecessary calls to realReplace(), as long as it doesn't cut out possible permutations if(!sofar.contains("i"))// s->z if((target_size>=2)&&(target.charAt(target_size-2)!='s')&&(target.charAt(target_size-2)!='z')) //Double-checking s/z if(target.charAt(target_size-2)=='e') body = realReplace(sofar+="z",body,(target.substring(0,target_size-2)+"es\n"),(sub.substring(0,sub_size-1)+"ez\n")); //s->z else if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) //bug stopper body = realReplace(sofar+="z",body,(target.substring(0,target_size-1)+"s\n"),(sub.substring(0,sub_size-1)+"z\n")); //s->z //ly - It might need some work if(target.equals("sly\n")) //special case body = realReplace(sofar+="l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); if(((target_size>=5)&&(!target.substring(target_size-5,target_size-1).equals("able")))||(target_size<5)) body = realReplace(sofar+="l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); //ably else body = realReplace(sofar+="l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); if((!sofar.contains("g"))&&(!sofar.contains("i"))&&(!sofar.contains("r"))){ //covers multiple if(target_size>=4){ //gerunds, include \n or space if((!target.endsWith("g\n"))&&(!target.endsWith("gs\n"))&&(!target.endsWith("gz"))) //leave no base uncovered if((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("ie"))) body = realReplace(sofar+="g",body,(target.substring(0,target_size-3)+"ying\n"),(sub.substring(0,sub_size-1)+"ing\n")); //replacing 'ie' before gerund else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+="g",body,(target.substring(0,target_size-2)+"ing\n"),(sub.substring(0,sub_size-1)+"ing\n")); //removing 'e' }else if((!target.endsWith("gs\n"))&&(!target.endsWith("gz"))) //no "ing\n" or s\z at end body = realReplace(sofar+="g",body,(target.substring(0,target_size-1)+"ing\n"),(sub.substring(0,sub_size-1)+"ing\n")); //no e, presumably ends in consonant if((!sofar.contains("a"))&&(!sofar.contains("d"))) //ish if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ed")))||(target_size<3)) body = realReplace(sofar+="i",body,(target.substring(0,target_size-1)+"ish\n"),(sub.substring(0,sub_size-1)+"ish\n")); if(!sofar.contains("a")) //able if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) body = realReplace(sofar+="a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); else if(target.equals("fly")||target.equals("unfly")) body = realReplace(sofar+="a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); else if(((target_size>=4)&&(target.substring(target_size-4,target_size-1).equals("ing")))||(target_size<4)) body = realReplace(sofar+="a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"eybuhl\n")); } if((!sofar.contains("g"))&&(!sofar.contains("d"))){ //covers multiple if(target_size>=2) //d at end if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if(target.charAt(target_size-2)=='e') body = realReplace(sofar+="d",body,(target.substring(0,target_size-1)+"d\n"),(sub.substring(0,sub_size-1)+"st\n")); else if((target.charAt(target_size-2)!='s')||((target.substring(target_size-3,target_size-1).equals("ss")))) body = realReplace(sofar+="d",body,(target.substring(0,target_size-1)+"ed\n"),(sub.substring(0,sub_size-1)+"st\n")); else if(target.charAt(target_size-2)=='s') body = realReplace(sofar+="d",body,(target.substring(0,target_size-1)+"ed\n"),(sub.substring(0,sub_size-1)+"ed\n")); else if(target.substring(target_size-3,target_size-1).equals("se")) body = realReplace(sofar+="d",body,(target.substring(0,target_size-1)+"d\n"),(sub.substring(0,sub_size-1)+"ed\n")); //er if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+="r",body,(target.substring(0,target_size-1)+"r\n"),(sub.substring(0,sub_size-1)+"er\n")); //removing 'e' else body = realReplace(sofar+="r",body,(target.substring(0,target_size-1)+"er\n"),(sub.substring(0,sub_size-1)+"er\n")); } //Why do these need to be dealt with here? //Because these permuations need to be available to figure out which \n grammars to apply //ed, ish, ly, ing, able, edly, ishly, ably, lying, eding, abling //Dirty method - add a recursion counter to replace() //6 max - ed ish ly ing able z //ablingly, lyingly - 3 //ablinger //s-z, ly-l, ing-g, d-d, ish-i, able-a //everything abides i, nothing abides s/l //nevermind, not much likes i either //a allows l/s/d, //a forbids a, i //d forbids d, i //g forbids d, g, i, a //i forbids s, g, i, a //er-r //r forbids g, i, a //r is forbidden by s, l, g, d //-ity? //I think that forbiddance is total - no forbidden suffixes at any point before //all of the checks for these are rather crude, but they are all-encompassing } } } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } return body; } } Edited January 16, 2012 by Kurkistan 1 Link to comment Share on other sites More sharing options...
Turos Posted January 16, 2012 Author Report Share Posted January 16, 2012 *Opens .doc with Alethi script at 72* *Screen explodes* Mwahahahaha! My evil plan of destruction has destroyed your computer! Now I need a new project Link to comment Share on other sites More sharing options...
Kurkistan Posted January 17, 2012 Report Share Posted January 17, 2012 (edited) Mwahahahaha! My evil plan of destruction has destroyed your computer! Now I need a new project *Pulls vacuum-tube monitor out of basement* Ah ha! Your evil scheme has failed! I think we may actually be reaching the endgame here. Someone who isn't me needs to go over a relatively long and diverse text with a fine-toothed comb to find phonetic errors, and we'll probably need to rearrange the order of the grammars at some point before the day is over, but barring massive oversights on my part, I think we may have most everything we need written down in the program. Updated "-ly" suffixes to cover all "-y" suffixes, reordered grammars, added various new grammars. /** * Goal: Provide an easy means of transliterating Roman letters into Alethi script using Turos's font conventions. * * * @author Kurkistan * @date 01/16/2012 * @version 1.7.4.5 */ import java.io.FileReader; import java.io.FileWriter; import java.io.BufferedWriter; import java.io.InputStreamReader; import java.io.File; import java.io.PrintWriter; import java.io.IOException; import java.util.Scanner; import java.io.BufferedReader; import java.util.Arrays; public class AlethiTransliterator_1_7_4_5 { static boolean debug_char = true; static boolean debug_end_e = false; public static void main (String[] arg) throws IOException{ Scanner input=new Scanner(System.in); System.out.print("Enter input file (full name of file in same directory): "); String temp = input.next(); //temp = "Test.txt"; final double startTime = System.currentTimeMillis(); final double endTime; try { String alethi = convertText(temp); if(alethi.equals("&")) return; temp = "Alethi_"+temp; writeFile(alethi,temp); if(debug_char){ String violations = allowedCharacters(alethi); //debugging blatant errors if(!violations.equals("")) System.out.println("Unauthorized sections in text (Line:Violation):"+"\n"+violations); } } finally { endTime = System.currentTimeMillis(); } final double duration = endTime - startTime; System.out.println("Execution time: "+(duration/1000)+" seconds"); } private static String convertText(String roman) throws IOException { char[] body = readFile(roman); if((body.length==1)&&(body[0]=='&')) //invalid input, halt program return "&"; periodMover(body); String alethi = replaceLetters(body); return alethi; } /** * Load a text file contents as a <code>String<code>. * * @param file The input file * @return The file contents as a <code>String</code> * @exception IOException IO Error */ private static char[] readFile(String file) throws IOException { String whole = ""; try { BufferedReader in = new BufferedReader(new FileReader(file)); String str; while ((str = in.readLine()) != null) { whole = whole + str + '\n'; //process(str); } in.close(); } catch (IOException e) { System.out.println("Invalid file path"); return "&".toCharArray(); } whole="\n"+whole.toLowerCase(); //convert to lower - keeping an extra \n at the end and beginning for replacement ease of use, will get rid of it return whole.toCharArray(); } private static void writeFile(String text, String destination) throws IOException { File file = new File(destination); boolean exist = file.createNewFile(); if (!exist) { System.out.println("Output file already exists."); System.exit(0); } else { FileWriter fstream = new FileWriter(destination); BufferedWriter out = new BufferedWriter(fstream); out.write(text); out.close(); System.out.println("File created successfully."); } } private static String allowedCharacters(String body) { //c, q, w, x, th, sh, ch - Forbidden; I assume no lowercases of the special characters (C, X) //\n, ' ', '.', C, S/s, T/t, X, - Allowed char[] library = new char[29]; String[] pairs = {"th","sh","ch"}; //These shouldn't trigger unless I made a serious mistake in the "necessary" section. char[] body_array = body.toCharArray(); String violations = ""; int line = 1; //for all of those +1ers out there int target_size = 2; int search = body.length() - target_size; for(int j = 0;j<pairs.length;j++) for(int i = 0; i<=search;i++) if(body_array=='\n') line++; else if(body.substring(i,i+target_size).equals(pairs[j])) violations = violations + (line+":"+pairs[j]) + "; "; library[0] = '\n'; library[1] = ' '; library[2] = '.'; library[3] = 'C'; library[4] = 'S'; library[5] = 'T'; library[6] = 'X'; int place = 7; for(int i = 97; i <=122; i++){ if((i!=99)&&(i!=113)&&(i!=119)&&(i!=120)){ //c, q, w, and x library[place] = (char)i; place++; } } line = 1; //resetting for(int i = 0;i<body.length();i++) if(body_array=='\n') line++; else if(Arrays.binarySearch(library,body_array)<0) //not in library violations = violations + (line+":"+body_array) + "; "; return violations; } /** * In the Alethi alphabet, sentences start with a period '.' and don't end with anything. */ private static void periodMover(char[] array) { int temp = 0; for(int i=0;i<array.length;i++) { if(array=='.'){ if(!(((array.length - i) >= 3)&&(array==array[i+1])&&(array[i+1]==array[i+2]))) //ellipsis { twistRight(array,temp,i); i++; while(i<array.length) if(!inAlphabet(array)) i++; else break; //Yes, the cardinal sin. temp=i; } else if(((array.length-i)>=3)&&(array==array[i+1])&&(array[i+1]==array[i+2])) for(int j=0;j<3;j++) twistRight(array,temp+j,i+j); } else if(array=='\n') temp=i+1; //Doesn't allow sentences to continue after true line breaks. Enables no-period headers and whatnot. } } private static boolean inAlphabet(char character) { char[] library = new char[26]; int place = 0; for(int i = 97; i <=122; i++){ library[place] = (char)i; place++; } if(Arrays.binarySearch(library,character)>=0) //I felt embarrassed by my earlier search algorithm. return true; return false; } private static void twistRight(char[] array, int start, int end) { if (start==end) return; char a = array[start]; char b; array[start] = array[end]; //'.', although this is generalized while(start!=end) { start++; b = array[start]; array[start] = a; a = b; } } /** * Special charaters: For t, use lower case t. For th, use capital T. For s, use lower case s. For sh, use capital S. For ch, use c. X will print a combination of k and s. For q and w, use your imagination. Technically speaking, q is a combination of k and u. W is basically a combination of a long u ("oo") and any other vowel: a e i o and short u ("uh") */ private static String replaceLetters(char[] array) { String body = new String(array); //Ease of use //1.3.5-Threw in an If statement in the replace function to deal with space and \n at the same time //ph body = replace(body,"ph","f"); //anti- body = replace(body,".anti",".antahy"); //E at end - Some interference possible with C's body = replace(body,"use\n","yooz\n"); body = replace(body,"used\n","yoozd\n"); //special case //Note: Need to make sure that plurals of e-enders are covered, i.e. wives. body = replace(body,"like\n","lahyk\n"); body = replace(body,"ole\n","ohl\n"); //hyperbole will suffer body = replace(body,"ose\n","ohz\n"); body = replace(body,"ame\n","eym\n"); body = replace(body,"ese\n","eez\n"); body = replace(body,"ave\n","eyv\n"); body = replace(body,"eive\n","eev\n"); body = replace(body,"vive\n","vahyv\n"); body = replace(body,"ive\n","iv\n"); body = replace(body,"eve\n","eev\n"); body = replace(body,"ile\n","hyl\n"); body = replace(body,"gle\n","guhl\n"); body = replace(body,"base\n","beys\n"); //And now the ends-with function on scrabblefinder.com was useful body = replace(body,"case\n","ceys\n"); //Don't need to allow for c->k if c's are bellow body = replace(body,"chase\n","Ceys\n"); //ch == C body = replace(body,"erase\n","ihreys\n"); body = replace(body,"ase\n","eez\n"); body = replace(body,"olve\n","olv\n"); body = replace(body,"alve\n","ahv\n"); body = replace(body,"elve\n","elv\n"); body = replace(body,"some\n","suhm\n"); body = replace(body,"come\n","cuhm\n"); //Need to move this up body = replace(body,"ome\n","ohm\n"); body = replace(body,"vate\n","vit\n"); body = replace(body,"ate\n","eyt\n"); body = replace(body,"tle\n","l\n"); //This is what dictionary.com said to do, and I live to serve body = replace(body,"ine\n","ahyn\n"); body = replace(body,".one\n",".uhn\n"); body = replace(body,"done\n","duhn\n"); body = replace(body,"none\n","nuhn\n"); body = replace(body,"one\n","ohn\n"); body = replace(body,"ake\n","eyk\n"); body = replace(body,"ope\n","ohp\n"); String[] temp = {"en","st","un","c","f","g","s","t",""}; body = replace(body,"ctable\n","kteybuhl\n"); //save the c's! for(int i = 0; i<temp.length;i++) if(temp.equals("c")) body = replace(body,"kable\n","eybuhl\n"); else body = replace(body,temp+"able\n","eybuhl\n"); body = replace(body,"able\n","uhbuhl\n"); //This one is either "eybuhl" for a few short words or "uhbuhl" for all others body = replace(body,"rue\n","roo\n"); body = replace(body,"ide\n","ahyd\n"); body = replace(body,"ife\n","ahyf\n"); body = replace(body,"ade\n","eyd\n"); //ere - their vs there body = replace(body,"ere\n","eir\n"); //ore, as in fore, bore body = replace(body,"ore","ohr"); body = replace(body,".are\n",".ahr\n"); body = replace(body,"are\n","air\n"); body = replace(body,"oke\n","ohk\n"); body = replace(body,"tire","tahyuhr"); //NOT \n or e body = replace(body,"aire\n","air\n"); body = replace(body,"ire\n","yuhr\n"); //? body = replace(body,"ype\n","ahyp\n"); body = replace(body,"urge\n","urj\n"); body = replace(body,"erge\n","urj\n"); //Not a mistake body = replace(body,"arge\n","hrj\n"); body = replace(body,"orge\n","wrj\n"); body = replace(body,"ime\n","ahym\n"); body = replace(body,"ble\n","buhl\n"); body = replace(body,"sle\n","ahyl\n"); body = replace(body,"promise\n","promis\n"); body = replace(body,"aise\n","eyz\n"); body = replace(body,"ise\n","ahyz\n"); body = replace(body,"lse\n","ls\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"sce\n","es\n"); //gue - This one was irritating, might not be right body = replace(body,"logue\n","awg\n"); body = replace(body,"gogue\n","awg\n"); body = replace(body,".morgue\n",".mawrg\n"); body = replace(body,".fugue\n",".fyoog\n"); body = replace(body,".segue\n",".segwey\n"); body = replace(body,"rgue\n","rgyoo\n"); body = replace(body,"gue\n","eeg\n"); //-nge body = replace(body,"nge\n","nj\n"); //problem with sing vs singe not really being separable at the gerund-testing level body = replace(body,"sinjing\n","singing\n"); //comprehensive fix for gerund mishaps body = replace(body,"slinjing\n","slinging\n"); body = replace(body,"strinjing\n","stringing\n"); body = replace(body,"swinjing\n","swinging\n"); body = replace(body,"brinjing\n","bringing\n"); body = replace(body,"flinjing\n","flinging\n"); body = replace(body,"prinjing\n","pringing\n"); body = replace(body,".winjing\n",".winging\n"); body = replace(body,".zinjing\n",".zinging\n"); body = replace(body,".dinjing\n",".dinging\n"); body = replace(body,".pinjing\n",".pinging\n"); //END E's //s at end - 1.7.4.5 -> unneeded, I think //body = replace(body,"es\n","ez\n"); //Needs to go before c->s conversion, since C's are all soft S's //This is a big thing. I moved the c down mainly to allow for the s->z convertor to do it's job, and the judgement on whether or not this messes things up is pending. //START C 1.7 - moved so that higher number of characters in target get's preference, blocks kept cohesive //Stolen from the "necessary" bin. body = replace(body,"ch","C"); //Although both versions of C work, I'm assuming capitalized, so no lowercas c's are allowed in the text body = replace(body,"accent","aksent"); body = replace(body,"exercise\n","eksersahyz\n"); body = replace(body,".once",".wuhns"); body = replace(body,"ucca","uhka"); body = replace(body,"ucco","uhko"); body = replace(body,"uccu","uhku"); body = replace(body,".occ",".uhk"); body = replace(body,"ucce","uhkse"); body = replace(body,"ucci","uhksi"); body = replace(body,"occup","okyuh"); //very special case body = replace(body,"occa","uhkah"); body = replace(body,"occi","oksi"); body = replace(body,"occe","ochee"); //? body = replace(body,"occo","okuh"); body = replace(body,"occu","okuh"); //Just went down the list on http://www.morewords.com/contains/cc - Useful, if laborious body = replace(body,"preface\n","prefis\n"); //special body = replace(body,"ace\n","eys\n"); body = replace(body,"icise\n","uhsahyz\n"); body = replace(body,"rcise\n","ruhsahyz\n"); body = replace(body,"ciate\n","sheeeyt\n"); body = replace(body,"cision\n","sizhuhn\n"); body = replace(body,"cise\n","sahys\n"); body = replace(body,"cist\n","sist"); body = replace(body,"uce\n","us\n"); body = replace(body,"uces\n","usez\n"); //z incorporated body = replace(body,"uced\n","usst\n"); //D's body = replace(body,"came\n","keym\n"); body = replace(body,"came","kamuh"); body = replace(body,".acid\n",".asid\n"); body = replace(body,".aci",".uhsi"); body = replace(body,"ierce\n","eers\n"); //body = replace(body,".ance",".ahns"); body = replace(body,".trance",".trahns"); body = replace(body,"dance\n","dahns\n"); body = replace(body,"Cance","Cahns"); body = replace(body,"cance","cahns"); body = replace(body,"lance","lahns"); body = replace(body,"ance\n","uhns\n"); body = replace(body,"all\n","awl\n"); body = replace(body,"ice\n","ahys\n"); //Long S. NOT sure about \n's body = replace(body,"ces\n","seez\n"); body = replace(body,"cez\n","seez\n"); //Incase of S->Z body = replace(body,"ce\n","s\n"); body = replace(body,"ci\n","sahy\n"); body = replace(body,".acc",".aks"); //the dreaded double c's body = replace(body,".ecc",".eks"); body = replace(body,".scie",".sahye"); //For Science! body = replace(body,"sciou","shuh"); //For Conscience! body = replace(body,"cious","shuhs"); //For Ithaca! body = replace(body,"scio","shuh"); body = replace(body,"scie","shuh"); body = replace(body,"ply\n","plahy\n"); body = replace(body,".by\n",".bahy\n"); body = replace(body,".my\n",".mahy\n"); body = replace(body,".die\n",".dahy\n"); body = replace(body,".dye\n",".dahy\n"); body = replace(body,".bye\n",".bahy\n"); //conflict body = replace(body,"hype\n","hahyp\n"); body = replace(body,"hype","hahype"); body = replace(body,"hypo","hahypo"); body = replace(body,"hypn","hipn"); body = replace(body,"hyphen","hahyfuhn"); body = replace(body,"hyfen","hahyfuhn"); //ph->f body = replace(body,"yp","ip"); body = replace(body,"cies","seez"); //prophocies body = replace(body,"ciez","seez"); //s->z already done body = replace(body,"iew","yoo"); body = replace(body,".face",".feys"); body = replace(body,"face","feys"); body = replace(body,"acen","eysuhn"); //Don't get complacent body = replace(body,"ician","ishuhn"); //musician body = replace(body,"cism","sizuhm"); //anglicanism body = replace(body,"cial","shul"); body = replace(body,".acq",".akw"); //might need refinement body = replace(body,"cque","ke"); body = replace(body,"acquaint","uhkweynt"); body = replace(body,"cing","sing"); //1.6.5 - odyssey test body = replace(body,"exce","ikse"); body = replace(body,"excit","iksahyt"); body = replace(body,"excis","eksahyz"); body = replace(body,"ence","ens"); body = replace(body,"ici","isi"); //Sicily body = replace(body,"iec","ees"); //Piece/Peace -> Pees body = replace(body,"eac","ees"); body = replace(body,"cep","sep"); body = replace(body,"cin","sin"); body = replace(body,".cit",".sit"); body = replace(body,"cip","sip"); body = replace(body,"cif","suhf"); body = replace(body,"icc","ik"); body = replace(body,"icn","ikn"); body = replace(body,"sce","se"); body = replace(body,"sci","si"); body = replace(body,"scy","sahy"); //body = replace(body,"sco","sko"); body = replace(body,"cea","sea"); body = replace(body,"nci","nsi"); //might need refinement body = replace(body,"ncy","nsee"); body = replace(body,"cei","see"); body = replace(body,"cee","see"); body = replace(body,"cent","sent"); //odyssey //starting with c body = replace(body,".cy",".sahy"); body = replace(body,".cir",".sur"); body = replace(body,".cid",".sahyd"); body = replace(body,".ci",".si"); body = replace(body,".cer",".sur"); body = replace(body,".ce",".se"); body = replace(body,"ck","k"); body = replace(body,"sc","sk"); body = replace(body,"cy","see"); //1.4.3 - si->see body = replace(body,"ce","se"); body = replace(body,"ca","ka"); body = replace(body,"co","ko"); body = replace(body,"cu","ku"); body = replace(body,"ct","kt"); body = replace(body,"cl","kl"); body = replace(body,"cr","kr"); body = replace(body,".c",".k"); //This can possibly leave lowercase c's in the text, although I think that all properly spelled words should be covered here. body = replace(body,"c\n","k\n"); //END C'S //Not sure where to put this section //ss body = replace(body,"ss","s"); //wh body = replace(body,"who\n","hu\n"); body = replace(body,"where","hwair"); body = replace(body,"whir","hwur"); body = replace(body,"wh,","hw"); //Might need more permutations //gh body = replace(body,"gha","gah"); //This section need work body = replace(body,"gho","goh"); body = replace(body,"ought","awt"); body = replace(body,"though","thoh"); body = replace(body,"bough","bou"); body = replace(body,"cough","kof"); body = replace(body,"gh\n","\n"); body = replace(body,"gh","g"); //to, too, two - Just a quick patch for those three words, not a general solution to any problem I can see body = replace(body,".to\n",".too\n"); body = replace(body,".two\n",".too\n"); //q at end body = replace(body,"q\n","k\n"); //w at end body = replace(body,"ow\n","au\n"); //.sy body = replace(body,".syr",".suhr"); //Moved up to e-enders body = replace(body,".syr",".sir"); body = replace(body,".sly",".slahy"); body = replace(body,".lying\n",".lahying\n"); body = replace(body,".ly",".li"); //sz->siz - The coward's way out. I need to sit down and make this thing more cohesive body = replace(body,"sz\n","siz\n"); body = realReplace("qqq",body,"y\n","ee\n"); body = realReplace("qqq",body,"ehee\n","ehy\n"); body = realReplace("qqq",body,"ahee\n","ahy\n"); body = realReplace("qqq",body,"eee\n","ey\n"); //fixing issues raised by y->ee as compared to other phonetics //The annoying part is the hodge-podgeness of English. The only workable rout may be just to demand phonetic spelling in cases like "Tow" //Necessary --Moved down to make ease-of-use conversions easier body = replace(body,"th","T"); body = replace(body,"sh","S"); //body = replace(body,"ch","C"); //took some liberties here, capitalized the C to make room for the c->k/s conversion body = replace(body,"x","X"); //Consistency - x is really a compound character of ks. body = replace(body,"q","ku"); /* body = replace(body,"wa","ua"); //Unnecessary? I think not! I'm not sure why, but no. body = replace(body,"we","ue"); body = replace(body,"wi","ui"); body = replace(body,"wo","uo"); body = replace(body,"wu","uu"); */ body = replace(body,"w","u"); //exception catcher if(debug_end_e){ body = replace(body,"e\n","Q\n"); //Just for debugging body = replace(body,".TQ",".Te"); body = replace(body,".bQ",".be"); body = replace(body,".seQ",".seee"); body = replace(body,".mQ",".me"); body = replace(body,"eQ\n","ee\n"); body = replace(body,"Qy\n","ey\n"); body = replace(body,".hQ",".he"); body = replace(body,".shQ",".she"); } return body.substring(1,body.length()-1); //clipping first/last '\n' } private static String replace(String body, String target, String sub){ return realReplace("",body,target,sub); } private static String realReplace(String sofar, String body, String target, String sub) { int target_size = target.length(); int sub_size = sub.length(); //'.'==' ' // if(target.startsWith(".")) // System.out.println(target); if(target.startsWith(".")){ body = replace(body,(" "+target.substring(1,target_size)),(" "+sub.substring(1,sub_size))); body = replace(body,("\n"+target.substring(1,target_size)),("\n"+sub.substring(1,sub_size))); } if(target.endsWith("\n")){ //checks for spaces and for plurals, also does s->z conversion where necessary body = replace(body,(target.substring(0,target_size-1)+" "),(sub.substring(0,sub_size-1)+" ")); //space substitution if(sofar.length()<=2){ //that took longer than it should have. Anyone who can suggest improvements is welcome to try. if((!sofar.contains("z"))&&(!sofar.contains("l"))){ //I think contains() covers it. It saves time over endsWith() if it stops unnecessary calls to realReplace(), as long as it doesn't cut out possible permutations if(!sofar.contains("i"))// s->z if((target_size>=2)&&(target.charAt(target_size-2)!='s')&&(target.charAt(target_size-2)!='z')) //Double-checking s/z if(target.charAt(target_size-2)=='e') body = realReplace(sofar+="z",body,(target.substring(0,target_size-2)+"es\n"),(sub.substring(0,sub_size-1)+"ez\n")); //s->z else if(((target_size>=2)&&(target.substring(target_size-2,target_size-1).equals("y")))||(target_size<3)) //bug stopper body = realReplace(sofar+="z",body,(target.substring(0,target_size-2)+"ies\n"),(sub.substring(0,sub_size-1)+"iez\n")); //s->z else body = realReplace(sofar+="z",body,(target.substring(0,target_size-1)+"s\n"),(sub.substring(0,sub_size-1)+"z\n")); //s->z /* //y body = realReplace("qqq",body,"ay\n","ey\n"); //stopgap, might want to revisit body = replace(body,"ey\n","ey\n"); body = realReplace("qqq",body,"oy\n","oi\n"); body = realReplace("qqq",body,"uy\n","ahy\n"); body = realReplace("qqq",body,"y\n","ee\n"); //might need generalized in replace() body = replace(body,"ty","tahy"); */ //ly, focus on y as of 1.7.4.3 - It might need some work if(target.equals("sly\n")) //special case body = realReplace(sofar+="l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); if((target_size>=5)&&(target.substring(target_size-5,target_size-1).equals("able"))) body = realReplace(sofar+="l",body,(target.substring(0,target_size-2)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); //ably else if((target_size>=2)&&(target.charAt(target_size-2)=='a')) body = realReplace(sofar+="l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-2)+"ey\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+="l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-1)+"y\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='o')) body = realReplace(sofar+="l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-1)+"i\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='u')) body = realReplace(sofar+="l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-2)+"ahy\n")); else body = realReplace(sofar+="l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); if((!sofar.contains("g"))&&(!sofar.contains("i"))&&(!sofar.contains("r"))){ //covers multiple if(target_size>=4){ //gerunds, include \n or space if((!target.endsWith("g\n"))&&(!target.endsWith("gs\n"))&&(!target.endsWith("gz"))) //leave no base uncovered if((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("ie"))) body = realReplace(sofar+="g",body,(target.substring(0,target_size-3)+"ying\n"),(sub.substring(0,sub_size-1)+"ing\n")); //replacing 'ie' before gerund else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+="g",body,(target.substring(0,target_size-2)+"ing\n"),(sub.substring(0,sub_size-1)+"ing\n")); //removing 'e' }else if((!target.endsWith("gs\n"))&&(!target.endsWith("gz"))) //no "ing\n" or s\z at end body = realReplace(sofar+="g",body,(target.substring(0,target_size-1)+"ing\n"),(sub.substring(0,sub_size-1)+"ing\n")); //no e, presumably ends in consonant if((!sofar.contains("a"))&&(!sofar.contains("d"))) //ish if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ed")))||(target_size<3)) body = realReplace(sofar+="i",body,(target.substring(0,target_size-1)+"ish\n"),(sub.substring(0,sub_size-1)+"ish\n")); if(!sofar.contains("a")) //able if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) body = realReplace(sofar+="a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); else if(target.equals("fly")||target.equals("unfly")) body = realReplace(sofar+="a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); else if(((target_size>=4)&&(target.substring(target_size-4,target_size-1).equals("ing")))||(target_size<4)) body = realReplace(sofar+="a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"eybuhl\n")); } if((!sofar.contains("g"))&&(!sofar.contains("d"))){ //covers multiple if(target_size>=2) //d at end if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if(target.charAt(target_size-2)=='e') body = realReplace(sofar+="d",body,(target.substring(0,target_size-1)+"d\n"),(sub.substring(0,sub_size-1)+"st\n")); else if((target.charAt(target_size-2)!='s')||((target.substring(target_size-3,target_size-1).equals("ss")))) body = realReplace(sofar+="d",body,(target.substring(0,target_size-1)+"ed\n"),(sub.substring(0,sub_size-1)+"st\n")); else if(target.charAt(target_size-2)=='s') body = realReplace(sofar+="d",body,(target.substring(0,target_size-1)+"ed\n"),(sub.substring(0,sub_size-1)+"ed\n")); else if(target.substring(target_size-3,target_size-1).equals("se")) body = realReplace(sofar+="d",body,(target.substring(0,target_size-1)+"d\n"),(sub.substring(0,sub_size-1)+"ed\n")); //er if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+="r",body,(target.substring(0,target_size-1)+"r\n"),(sub.substring(0,sub_size-1)+"er\n")); //removing 'e' else body = realReplace(sofar+="r",body,(target.substring(0,target_size-1)+"er\n"),(sub.substring(0,sub_size-1)+"er\n")); } //Why do these need to be dealt with here? //Because these permuations need to be available to figure out which \n grammars to apply //ed, ish, ly, ing, able, edly, ishly, ably, lying, eding, abling //Dirty method - add a recursion counter to replace() //6 max - ed ish ly ing able z //ablingly, lyingly - 3 //ablinger //s-z, ly-l, ing-g, d-d, ish-i, able-a //everything abides i, nothing abides s/l //nevermind, not much likes i either //a allows l/s/d, //a forbids a, i //d forbids d, i //g forbids d, g, i, a //i forbids s, g, i, a //er-r //r forbids g, i, a //r is forbidden by s, l, g, d //I think that forbiddance is total - no forbidden suffixes at any point before } } } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } return body; } } Edited January 17, 2012 by Kurkistan 1 Link to comment Share on other sites More sharing options...
Turos Posted January 17, 2012 Author Report Share Posted January 17, 2012 On it! Link to comment Share on other sites More sharing options...
Turos Posted January 17, 2012 Author Report Share Posted January 17, 2012 (edited) It is a little difficult to translate into a phonetics system another person has devised when my own system of phonetics is different. But I'll look at your code to get an idea. Also, here's a list of english suffixes, which might help for finding rarer ones: http://www.michigan-proficiency-exams.com/suffix-list.html tion > Sun, should also affect 'tionate', 'tionately' and 'tioning'. sion > Sun, should also affect 'sionate' and 'sionately'. tech > tek ocean > oSun ture > Cur advance > advans, not advuhns row > ro, not rau low > lo, not lau age\n > aij\n 'ged\n' > 'jed' 'ridge' could be 'rij' instead of 'ridge' (or instead of 'ridje' if the above suggestion were changed) ques\n > ks\n, not kuues\n, but this should not interefere with 'question'. 'que\n' > 'k\n', but this should not interfere with 'question'. 'reading' becomes 'reeyding', but 'reader' doesn't change. strategies should not become 'strategkiez'. possibilities should not become 'posibilitkiez'. ture > Cur ely\n > lee\n 'typically' should have the -lee suffix, but this should not interfere with the word 'ally' when a change is made. 'specific' should not be 'spesuhfik', but 'spesifik'. tual\n > Cual\n 'one' should be 'uuhn', not 'uhn' I guess... kind of a wierd one without long and short vowel characters. 'while' could be 'uhyl' instead of 'uhhyl'. Here's a tough one: 'disciplines' should have 'plinez' and not 'plahynez', but 'plahynez' is perfect for 'splines'. 'insight' should be 'insahyt', not 'insigt'. 'rightly' should be 'rahytlee', not 'rigtlee'. 'wrongly' should be 'ronglee', not 'rongly'. 'deciding' should be 'desahyding', not 'dekahyding'. 'services' should be 'servisez', not 'servahysez'. 'associated' should be 'asoSiated', not 'asociated'. Not sure how that 'c' got through. 'accomplishing' should be 'akompliSing', not 'aksompliSing'. 'highly' should be 'hahylee', not 'hilee'. 'have' should be 'hav', not 'heyv'. 'references' should be 'referensez', not 'referenseez'. 'quickly' should be 'kuiklee', not 'kuuiklee'. This one is not a big deal, just has an extra letter. 'tacit' > 'tasit'. It's a word that didn't get the 'c' converted. 'practice' should be 'praktis', not 'practahys'. 'quite' should be 'kuahyt', not 'kuuite'. Notice the extra 'u' again. There was a reference to T.S. Eliot. As you can imagine, this messed with the punctuation. Don't know how to even suggest tackling something like this... Also, abreviations tend to have extra periods. Maybe a list of common abbreviations would be good to include for the purpose of removing those extra periods. Sounds hard to me though. Here's a few that are common in books, used in referencing other works: p. > page; pg. > page; pgs. > pages; lit. > literally; no. > number (man this one sucks...); who knows what others... maybe these are a "That's too bad" scenario, haha. I found an instance where a sentence was indented with the tab key. When the period was moved to the front, it was placed before the indentation with a gap before the first word of the sentence. I found one case where the word 'poetry' was translated to 'poetree', but a few words later, another instance of the same word followed by a comma was not converted. Maybe commas mess with the '-try' suffix? Perhaps removing commas first thing might be beneficial. Also, is it possible to convert all number characters into their spelled forms? Actually, maybe I will add number symbols. That will make it easier. Is it possible to remove all extra characters such as ' , ; : @ # $ % ^ & * ( ) - _ = + / \ | ` ~ < > [ ] { } Hope this helps! And regardless of the existing glitches, this program is amazing. I see lots of complex words transliterated perfectly throughout the article. Great work man! Edited January 17, 2012 by Turos 1 Link to comment Share on other sites More sharing options...
Kurkistan Posted January 17, 2012 Report Share Posted January 17, 2012 (edited) It is a little difficult to translate into a phonetics system another person has devised when my own system of phonetics is different. But I'll look at your code to get an idea. Also, here's a list of english suffixes, which might help for finding rarer ones: http://www.michigan-proficiency-exams.com/suffix-list.html tion > Sun, should also affect 'tionate', 'tionately' and 'tioning'. sion > Sun, should also affect 'sionate' and 'sionately'. tech > tek ocean > oSun ture > Cur advance > advans, not advuhns row > ro, not rau low > lo, not lau age\n > aij\n 'ged\n' > 'jed' 'ridge' could be 'rij' instead of 'ridge' (or instead of 'ridje' if the above suggestion were changed) ques\n > ks\n, not kuues\n, but this should not interefere with 'question'. 'que\n' > 'k\n', but this should not interfere with 'question'. 'reading' becomes 'reeyding', but 'reader' doesn't change. strategies should not become 'strategkiez'. possibilities should not become 'posibilitkiez'. ture > Cur ely\n > lee\n 'typically' should have the -lee suffix, but this should not interfere with the word 'ally' when a change is made. 'specific' should not be 'spesuhfik', but 'spesifik'. tual\n > Cual\n 'one' should be 'uuhn', not 'uhn' I guess... kind of a wierd one without long and short vowel characters. 'while' could be 'uhyl' instead of 'uhhyl'. Here's a tough one: 'disciplines' should have 'plinez' and not 'plahynez', but 'plahynez' is perfect for 'splines'. 'insight' should be 'insahyt', not 'insigt'. 'rightly' should be 'rahytlee', not 'rigtlee'. 'wrongly' should be 'ronglee', not 'rongly'. 'deciding' should be 'desahyding', not 'dekahyding'. 'services' should be 'servisez', not 'servahysez'. 'associated' should be 'asoSiated', not 'asociated'. Not sure how that 'c' got through. 'accomplishing' should be 'akompliSing', not 'aksompliSing'. 'highly' should be 'hahylee', not 'hilee'. 'have' should be 'hav', not 'heyv'. 'references' should be 'referensez', not 'referenseez'. 'quickly' should be 'kuiklee', not 'kuuiklee'. This one is not a big deal, just has an extra letter. 'tacit' > 'tasit'. It's a word that didn't get the 'c' converted. 'practice' should be 'praktis', not 'practahys'. 'quite' should be 'kuahyt', not 'kuuite'. Notice the extra 'u' again. There was a reference to T.S. Eliot. As you can imagine, this messed with the punctuation. Don't know how to even suggest tackling something like this... Also, abreviations tend to have extra periods. Maybe a list of common abbreviations would be good to include for the purpose of removing those extra periods. Sounds hard to me though. Here's a few that are common in books, used in referencing other works: p. > page; pg. > page; pgs. > pages; lit. > literally; no. > number (man this one sucks...); who knows what others... maybe these are a "That's too bad" scenario, haha. I found an instance where a sentence was indented with the tab key. When the period was moved to the front, it was placed before the indentation with a gap before the first word of the sentence. I found one case where the word 'poetry' was translated to 'poetree', but a few words later, another instance of the same word followed by a comma was not converted. Maybe commas mess with the '-try' suffix? Perhaps removing commas first thing might be beneficial. Also, is it possible to convert all number characters into their spelled forms? Actually, maybe I will add number symbols. That will make it easier. Is it possible to remove all extra characters such as ' , ; : @ # $ % ^ & * ( ) - _ = + / \ | ` ~ < > [ ] { } Hope this helps! And regardless of the existing glitches, this program is amazing. I see lots of complex words transliterated perfectly throughout the article. Great work man! Ow. Very thorough. Thank you very much for doing this: I doubt that I could have stood going through another transliteration of the Odyssey looking for errors. I see much work ahead, but at least the end is in sight. *knocks on wood* The system of phonetics that I'm using is the one used on Dictionary.com, simply because of ease of use and consistency. I' going to say goodbye to efficiency for now and just tack on all of those suffixes without looking for necessary conflicts between them. A three-suffix limit should accomplish the job, although it will be less efficient than looking for individual conflicts. If you do end up adding number characters, be sure to tell me so that I can add them to the "allowed" list when removing forbidden characters. I'll whip something up to remove forbidden characters, but I think that abbreviations and acronyms will just have to go the way of the dinosaurs for now, given the complexity involved in fixing them and the relative ease with which they can be avoided. "Poetry," was an example of the comma not being recognized, so getting rid of them would solve that. I do warn you: I'm going to take a bit of a break for now, and I'll probably start working on this in a few hours at the earliest. I'm a bit burned out just now. EDIT: Looking at the suffix list, I think I'll just leave well enough alone for now. Most of them are just the ends of existing words, not "tacked on." Edited January 17, 2012 by Kurkistan Link to comment Share on other sites More sharing options...
Turos Posted January 17, 2012 Author Report Share Posted January 17, 2012 (edited) I do warn you: I'm going to take a bit of a break for now, and I'll probably start working on this in a few hours at the earliest. I'm a bit burned out just now. Shoot, if I were you, I'd take a week off after all that logic work EDIT: Ignore this one. A comma came after the word: 'wrongly' should be 'ronglee', not 'rongly'. Edited January 18, 2012 by Turos Link to comment Share on other sites More sharing options...
Kurkistan Posted January 18, 2012 Report Share Posted January 18, 2012 (edited) Shoot, if I were you, I'd take a week off after all that logic work EDIT: Ignore this one. A comma came after the word: 'wrongly' should be 'ronglee', not 'rongly'. That was fun. Thanks again for putting in all of that work. Now you get to re-check everything to make sure my fixes didn't mess anything else up! Yeah! I probably need to sit down and reorganize the grammars to eliminate interference, which was the cause of a fair amount of your issues, but I'm too close to it right now. I didn't get your tab problem, so if you still have it for this version, then please send me the before and after text files that contain that specific error. I also disagree with your categorical "ged\n"->"jeg\n." There's some nuance there. Fixed all of Turos's most recent bugs, added in "pp" rules, as well as rules for sufixes of words ending in 'p.' EDTI: Deleted some of the old versions to make room in my attachments. /** * Goal: Provide an easy means of transliterating Roman letters into Alethi script using Turos's font conventions. * * * @author Kurkistan, with significant developmental input from Turos * @date 01/18/2012 * @version 1.8.5 */ import java.io.FileReader; import java.io.FileWriter; import java.io.BufferedWriter; import java.io.InputStreamReader; import java.io.File; import java.io.PrintWriter; import java.io.IOException; import java.util.Scanner; import java.io.BufferedReader; import java.util.Arrays; public class AlethiTransliterator_1_8_5 { static boolean debug_char = false; static boolean debug_end_e = false; public static void main (String[] arg) throws IOException{ Scanner input=new Scanner(System.in); System.out.print("Enter input file (full name of file in same directory): "); String temp = input.next(); //temp = "Test.txt"; final double startTime = System.currentTimeMillis(); final double endTime; try { String alethi = convertText(temp); if(alethi.equals("&")) return; temp = "Alethi_"+temp; writeFile(alethi,temp); if(debug_char){ String violations = allowedCharacters(alethi); //debugging blatant errors if(!violations.equals("")) System.out.println("Unauthorized sections in text (Line:Violation):"+"\n"+violations); } } finally { endTime = System.currentTimeMillis(); } final double duration = endTime - startTime; System.out.println("Execution time: "+(duration/1000)+" seconds"); } private static String convertText(String roman) throws IOException { char[] body = readFile(roman); if((body.length==1)&&(body[0]=='&')) //invalid input, halt program return "&"; periodMover(body); roman = new String(body); if(!debug_char) roman = removeCharacters(roman); String alethi = replaceLetters(roman); return alethi; } /** * Load a text file contents as a <code>String<code>. * * @param file The input file * @return The file contents as a <code>String</code> * @exception IOException IO Error */ private static char[] readFile(String file) throws IOException { String whole = ""; try { BufferedReader in = new BufferedReader(new FileReader(file)); String str; while ((str = in.readLine()) != null) { whole = whole + str + '\n'; //process(str); } in.close(); } catch (IOException e) { System.out.println("File not in directory or misspelled."); return "&".toCharArray(); } whole="\n"+whole.toLowerCase(); //convert to lower - keeping an extra \n at the end and beginning for replacement ease of use, will get rid of it return whole.toCharArray(); } private static void writeFile(String text, String destination) throws IOException { File file = new File(destination); boolean exist = file.createNewFile(); if (!exist) { System.out.println("Output file already exists."); System.exit(0); } else { FileWriter fstream = new FileWriter(destination); BufferedWriter out = new BufferedWriter(fstream); out.write(text); out.close(); System.out.println("File created successfully."); } } private static String allowedCharacters(String body) { //c, q, w, x, th, sh, ch - Forbidden; I assume no lowercases of the special characters (C, X) //\n, ' ', '.', C, S/s, T/t, X, - Allowed char[] library = new char[29]; String[] pairs = {"th","sh","ch"}; //These shouldn't trigger unless I made a serious mistake in the "necessary" section. char[] body_array = body.toCharArray(); String violations = ""; int line = 1; //for all of those +1ers out there int target_size = 2; int search = body.length() - target_size; for(int j = 0;j<pairs.length;j++) for(int i = 0; i<=search;i++) if(body_array=='\n') line++; else if(body.substring(i,i+target_size).equals(pairs[j])) violations = violations + (line+":"+pairs[j]) + "; "; library[0] = '\n'; library[1] = ' '; library[2] = '.'; library[3] = 'C'; library[4] = 'S'; library[5] = 'T'; library[6] = 'X'; int place = 7; for(int i = 97; i <=122; i++){ if((i!=99)&&(i!=113)&&(i!=119)&&(i!=120)){ //c, q, w, and x library[place] = (char)i; place++; } } line = 1; //resetting for(int i = 0;i<body.length();i++) if(body_array=='\n') line++; else if(Arrays.binarySearch(library,body_array)<0) //not in library violations = violations + (line+":"+body_array) + "; "; return violations; } private static String removeCharacters(String body) { char[] library = new char[56]; library[0] = '\t'; //tab library[1] = '\n'; library[2] = ' '; library[3] = '.'; int place = 4; for(int i = 65; i <=90; i++){ library[place] = (char)i; place++; } for(int i = 97; i <=122; i++){ library[place] = (char)i; place++; } for(int i = 0; i < body.length(); i++) if(Arrays.binarySearch(library,body.charAt(i))<0){ //I felt embarrassed by my earlier search algorithm. body = body.substring(0,i)+body.substring(i+1,body.length()); i--; } return body; } /** * In the Alethi alphabet, sentences start with a period '.' and don't end with anything. */ private static void periodMover(char[] array) { int temp = 0; for(int i=0;i<array.length;i++) { if(array=='.'){ if(!(((array.length - i) >= 3)&&(array==array[i+1])&&(array[i+1]==array[i+2]))) //ellipsis { twistRight(array,temp,i); i++; while(i<array.length) if(!inAlphabet(array)) i++; else break; //Yes, the cardinal sin. temp=i; } else if(((array.length-i)>=3)&&(array==array[i+1])&&(array[i+1]==array[i+2])) { for(int j=0;j<3;j++) twistRight(array,temp+j,i+j); i+=3; while(i<array.length) if(!inAlphabet(array)) i++; else break; //Yes, the cardinal sin. temp=i; } } else if(array=='\n') temp=i+1; //Doesn't allow sentences to continue after true line breaks. Enables no-period headers and whatnot. } } private static boolean inAlphabet(char character) { char[] library = new char[26]; int place = 0; for(int i = 97; i <=122; i++){ library[place] = (char)i; place++; } if(Arrays.binarySearch(library,character)>=0) //I felt embarrassed by my earlier search algorithm. return true; return false; } private static void twistRight(char[] array, int start, int end) { if (start==end) return; char a = array[start]; char b; array[start] = array[end]; //'.', although this is generalized while(start!=end) { start++; b = array[start]; array[start] = a; a = b; } } public static void test() { String body = "\nsnapping snapper snappily snappy snaps snap snapped snappable snappably\n"; //snapping snapper snappily snappy snaps snap snapped snappable snappably. String target = "ap\n"; String sub = "op\n"; System.out.println(replace(body,target,sub)); int target_size = target.length(); int sub_size = sub.length(); String sofar = ""; if((target_size>=2)&&(target.charAt(target_size-2)=='p')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"pable\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } System.out.println(body); } /** * Special charaters: For t, use lower case t. For th, use capital T. For s, use lower case s. For sh, use capital S. For ch, use c. X will print a combination of k and s. For q and w, use your imagination. Technically speaking, q is a combination of k and u. W is basically a combination of a long u ("oo") and any other vowel: a e i o and short u ("uh") */ private static String replaceLetters(String body) { //Ease of use //1.3.5-Threw in an If statement in the replace function to deal with space and \n at the same time //ph body = replace(body,"ph","f"); //anti- body = replace(body,".anti",".antahy"); //wh body = replace(body,"who\n","hoo\n"); body = replace(body,"where","huair"); //changed w to u body = replace(body,"whir","huur"); body = replace(body,"wh","hu"); //Might need more permutations body = replace(body,".accr",".uhkr"); //many many many body = replace(body,".acci",".aksi"); body = replace(body,".accord",".uhkawrd"); body = replace(body,".accomp",".uhkuhmp"); body = replace(body,".acco",".uhko"); body = replace(body,".accustom\n",".uhkuhstuhm\n"); body = replace(body,".accolade\n",".akuhleyd\n"); body = replace(body,".accus",".uhkyooz"); body = replace(body,".accurs",".uhkurs"); body = replace(body,".accur",".akyer"); body = replace(body,".accum",".uhkyoom"); body = replace(body,".accout",".uhkoot"); body = replace(body,".accoun",".uhkount"); body = replace(body,".acce",".akse"); //the dreaded double c's body = replace(body,".ecc",".eks"); body = replace(body,"ucca","uhka"); body = replace(body,"ucco","uhko"); body = replace(body,"uccu","uhku"); body = replace(body,".occ",".uhk"); body = replace(body,"ucce","uhkse"); body = replace(body,"ucci","uhksi"); body = replace(body,"occup","okyuh"); //very special case body = replace(body,"occa","uhkah"); body = replace(body,"occi","oksi"); body = replace(body,"occe","ochee"); //? body = replace(body,"occo","okuh"); body = replace(body,"occu","okuh"); //Just went down the list on http://www.morewords.com/contains/cc - Useful, if laborious //E at end - Some interference possible with C's body = replace(body,"use\n","yooz\n"); body = replace(body,"used\n","yoozd\n"); //special case //Note: Need to make sure that plurals of e-enders are covered, i.e. wives. body = replace(body,"like\n","lahyk\n"); body = replace(body,"ole\n","ohl\n"); //hyperbole will suffer body = replace(body,"ose\n","ohz\n"); body = replace(body,"ame\n","eym\n"); body = replace(body,"ese\n","eez\n"); body = replace(body,"have\n","hav\n"); body = replace(body,"ave\n","eyv\n"); body = replace(body,"eive\n","eev\n"); body = replace(body,"vive\n","vahyv\n"); body = replace(body,"ive\n","iv\n"); body = replace(body,"eve\n","eev\n"); body = replace(body,"ile\n","ahyl\n"); //System.out.println(replace(replace("while ","wh","hu"),"ile\n","ahyl\n")); //huahyl body = replace(body,"gle\n","guhl\n"); body = replace(body,"base\n","beys\n"); //And now the ends-with function on scrabblefinder.com was useful body = replace(body,"case\n","ceys\n"); //Don't need to allow for c->k if c's are bellow body = replace(body,"chase\n","Ceys\n"); //ch == C body = replace(body,"erase\n","ihreys\n"); body = replace(body,"ase\n","eez\n"); body = replace(body,"olve\n","olv\n"); body = replace(body,"alve\n","ahv\n"); body = replace(body,"elve\n","elv\n"); body = replace(body,"some\n","suhm\n"); body = replace(body,"come\n","cuhm\n"); //Need to move this up body = replace(body,"ome\n","ohm\n"); body = replace(body,"tle\n","l\n"); //This is what dictionary.com said to do, and I live to serve body = replace(body,".discipline\n",".disipline\n"); body = replace(body,"ine\n","ahyn\n"); body = replace(body,".one\n",".uuhn\n"); body = replace(body,"done\n","duhn\n"); body = replace(body,"none\n","nuhn\n"); body = replace(body,"one\n","ohn\n"); body = replace(body,"ake\n","eyk\n"); body = replace(body,"ope\n","ohp\n"); body = replace(body,"rue\n","roo\n"); body = replace(body,"ife\n","ahyf\n"); body = replace(body,"bead\n","beed\n"); body = replace(body,".read\n",".reed\n"); body = replace(body,"nead\n","need\n"); body = replace(body,"lead\n","leed\n"); body = replace(body,"ead\n","ed\n"); //general body = replace(body,"ade\n","eyd\n"); //ere - their vs there body = replace(body,"ere\n","eir\n"); //ore, as in fore, bore body = replace(body,"ore","ohr"); body = replace(body,".are\n",".ahr\n"); body = replace(body,"are\n","air\n"); body = replace(body,"oke\n","ohk\n"); body = replace(body,"tire","tahyuhr"); //NOT \n or e body = replace(body,"aire\n","air\n"); body = replace(body,"ire\n","yuhr\n"); //? body = replace(body,"ype\n","ahyp\n"); body = replace(body,"urge\n","urj\n"); body = replace(body,"erge\n","urj\n"); //Not a mistake body = replace(body,"arge\n","hrj\n"); body = replace(body,"orge\n","wrj\n"); body = replace(body,"ime\n","ahym\n"); body = replace(body,"sle\n","ahyl\n"); body = replace(body,"promise\n","promis\n"); body = replace(body,"aise\n","eyz\n"); body = replace(body,"ise\n","ahyz\n"); body = replace(body,"lse\n","ls\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"sce\n","es\n"); body = replace(body,"que\n","k\n"); body = replace(body,"udge\n","uhj\n"); body = replace(body,"dge\n","j\n"); //NOT sure body = replace(body,"age\n","aij\n"); //gue - This one was irritating, might not be right body = replace(body,"logue\n","awg\n"); body = replace(body,"gogue\n","awg\n"); body = replace(body,".morgue\n",".mawrg\n"); body = replace(body,".fugue\n",".fyoog\n"); body = replace(body,".segue\n",".segwey\n"); body = replace(body,"rgue\n","rgyoo\n"); body = replace(body,"gue\n","eeg\n"); //-nge body = replace(body,"nge\n","nj\n"); //problem with sing vs singe not really being separable at the gerund-testing level body = replace(body,"sinjing\n","singing\n"); //comprehensive fix for gerund mishaps body = replace(body,"slinjing\n","slinging\n"); body = replace(body,"strinjing\n","stringing\n"); body = replace(body,"swinjing\n","swinging\n"); body = replace(body,"brinjing\n","bringing\n"); body = replace(body,"flinjing\n","flinging\n"); body = replace(body,"prinjing\n","pringing\n"); body = replace(body,".winjing\n",".winging\n"); body = replace(body,".zinjing\n",".zinging\n"); body = replace(body,".dinjing\n",".dinging\n"); body = replace(body,".pinjing\n",".pinging\n"); //END E's //s at end - 1.7.4.5 -> unneeded, I think //body = replace(body,"es\n","ez\n"); //Needs to go before c->s conversion, since C's are all soft S's //This is a big thing. I moved the c down mainly to allow for the s->z convertor to do it's job, and the judgement on whether or not this messes things up is pending. //START C 1.7 - moved so that higher number of characters in target get's preference, blocks kept cohesive //Stolen from the "necessary" bin. body = replace(body,"ch","C"); //Although both versions of C work, I'm assuming capitalized, so no lowercas c's are allowed in the text body = replace(body,"accent","aksent"); body = replace(body,"exercise\n","eksersahyz\n"); body = replace(body,".once",".wuhns"); body = replace(body,"preface\n","prefis\n"); //special body = replace(body,"icise\n","uhsahyz\n"); body = replace(body,"rcise\n","ruhsahyz\n"); body = replace(body,".tacit\n",".tasit\n"); body = replace(body,"ciate\n","sheeeyt\n"); body = replace(body,"vate\n","vit\n"); //pulled from E section, might be a sign of things to come body = replace(body,"literate\n","literit\n"); body = replace(body,"ate\n","eyt\n"); body = replace(body,"cision\n","sizhuhn\n"); body = replace(body,"cise\n","sahys\n"); body = replace(body,"cist\n","sist"); body = replace(body,"uce\n","us\n"); body = replace(body,"uces\n","usez\n"); //z incorporated body = replace(body,"uced\n","usst\n"); //D's body = replace(body,"came\n","keym\n"); body = replace(body,"came","kamuh"); body = replace(body,"tual\n","Cual"); body = replace(body,".acid\n",".asid\n"); body = replace(body,".aci",".uhsi"); body = replace(body,"ierce\n","eers\n"); body = replace(body,"ince\n","ins\n"); //body = replace(body,".ance",".ahns"); body = replace(body,".trance",".trahns"); body = replace(body,"dance\n","dahns\n"); body = replace(body,"Cance","Cahns"); body = replace(body,"cance","cahns"); body = replace(body,"lance","lahns"); body = replace(body,"vance","vahns"); body = replace(body,"ance\n","uhns\n"); body = replace(body,"all\n","awl\n"); body = replace(body,"tice\n","tis\n"); body = replace(body,"arice\n","eris\n"); body = replace(body,"orice\n","uhis\n"); body = replace(body,"cipice\n","suhpis\n"); //patch for precipice body = replace(body,"ipice\n","uhpis\n"); body = replace(body,".vice\n","vahys\n"); body = replace(body,"vice\n","vis\n"); body = replace(body,"ice\n","ahys\n"); //Long S. NOT sure about \n's body = replace(body,"egy\n","ijee\n"); //possibilities/strategies fix, I have now idea how the ended up "kiez" body = replace(body,"ity\n","itee\n"); body = replace(body,"ite\n","ahyt\n"); body = replace(body,"ong\n","ong\n"); body = replace(body,"ull\n","ool\n"); body = replace(body,"cide\n","sahyd\n"); body = replace(body,"ide\n","ahyd\n"); body = replace(body,"ence\n","ens\n"); body = replace(body,"ces\n","seez\n"); body = replace(body,"cez\n","seez\n"); //Incase of S->Z body = replace(body,"ce\n","s\n"); body = replace(body,"ci\n","sahy\n"); body = replace(body,"oy\n","oi\n"); body = replace(body,"ace\n","eys\n"); body = replace(body,"ely\n","lee\n"); //MUST BE LAST IN \N body = replace(body,".scie",".sahye"); //For Science! body = replace(body,"sciou","shuh"); //For Conscience! body = replace(body,"cious","shuhs"); //For Ithaca! body = replace(body,"scio","shuh"); body = replace(body,"scie","shuh"); body = replace(body,"ply\n","plahy\n"); body = replace(body,".by\n",".bahy\n"); body = replace(body,".my\n",".mahy\n"); body = replace(body,".die\n",".dahy\n"); body = replace(body,".dye\n",".dahy\n"); body = replace(body,".bye\n",".bahy\n"); //conflict body = replace(body,"hype","hahype"); body = replace(body,"hypo","hahypo"); body = replace(body,"hypn","hipn"); body = replace(body,"hyphen","hahyfuhn"); body = replace(body,"hyfen","hahyfuhn"); //ph->f body = replace(body,"yp","ip"); body = replace(body,"tion","Suhn"); //1.8 body = replace(body,"sion","zhuhn"); body = replace(body,"cean","Suhn"); body = replace(body,"ture","Cur"); body = replace(body,"cies","seez"); //prophocies body = replace(body,"ciez","seez"); //s->z already done body = replace(body,"iew","yoo"); body = replace(body,".face",".feys"); body = replace(body,"face","feys"); body = replace(body,"acen","eysuhn"); //Don't get complacent body = replace(body,"ician","ishuhn"); //musician body = replace(body,"cism","sizuhm"); //anglicanism body = replace(body,"cial","shul"); body = replace(body,".acq",".akw"); //might need refinement body = replace(body,"cque","ke"); body = replace(body,"acquaint","uhkweynt"); body = replace(body,"cing","sing"); //1.6.5 - odyssey test body = replace(body,"exce","ikse"); body = replace(body,"excit","iksahyt"); body = replace(body,"excis","eksahyz"); body = replace(body,"ici","isi"); //Sicily body = replace(body,"iec","ees"); //Piece/Peace -> Pees body = replace(body,"eac","ees"); body = replace(body,"ight","ahyt"); body = replace(body,"cep","sep"); body = replace(body,"cin","sin"); body = replace(body,".cit",".sit"); body = replace(body,"cip","sip"); body = replace(body,"cif","sif"); //NOT sure body = replace(body,"icc","ik"); body = replace(body,"icn","ikn"); body = replace(body,"sce","se"); body = replace(body,"sci","si"); body = replace(body,"scy","sahy"); //body = replace(body,"sco","sko"); body = replace(body,"cea","sea"); body = replace(body,"nci","nsi"); //might need refinement body = replace(body,"ncy","nsee"); body = replace(body,"cei","see"); body = replace(body,"cee","see"); body = replace(body,"cent","sent"); //odyssey body = replace(body,"ap\n","ap\n"); body = replace(body,"ppen","pen"); //double p's, might NOT be done body = replace(body,"ppl","puhl"); body = replace(body,"upp\n","uhp"); body = replace(body,"oppor","oper"); body = replace(body,"opp","uhp"); body = replace(body,"ypp","ip"); //starting with c body = replace(body,".cy",".sahy"); body = replace(body,".cir",".sur"); body = replace(body,".cid",".sahyd"); body = replace(body,".ci",".si"); body = replace(body,".cer",".sur"); body = replace(body,".ce",".se"); body = replace(body,"ck","k"); body = realReplace("QQQ",body,"C\n","k\n"); body = realReplace("QQQ",body,"ch\n","k\n"); body = replace(body,"sc","sk"); body = replace(body,"cy","see"); //1.4.3 - si->see body = replace(body,"ce","se"); body = replace(body,"ca","ka"); body = replace(body,"co","ko"); body = replace(body,"cu","ku"); body = replace(body,"ct","kt"); body = replace(body,"cl","kl"); body = replace(body,"cr","kr"); body = realReplace("QQQ",body,".c",".k"); //This can possibly leave lowercase c's in the text, although I think that all properly spelled words should be covered here. body = realReplace("QQQ",body,"c\n","k\n"); //to stop mischeif //END C'S //Not sure where to put this section //ss body = replace(body,"ss","s"); //gh body = replace(body,"gha","gah"); //This section needs work body = replace(body,"gho","goh"); body = replace(body,"ought","awt"); body = replace(body,"though","thoh"); body = replace(body,"bough","bou"); body = replace(body,"cough","kof"); body = replace(body,"igh","ahy"); body = replace(body,"gh\n","\n"); body = replace(body,"gh","g"); //to, too, two - Just a quick patch for those three words, not a general solution to any problem I can see body = replace(body,".to\n",".too\n"); body = replace(body,".two\n",".too\n"); //q at end body = realReplace("QQQ",body,"q\n","k\n"); //w at end body = replace(body,".low\n",".loh\n");//special cases body = replace(body,".row\n",".roh\n"); body = replace(body,"ow\n","au\n"); //.sy body = replace(body,".syr",".suhr"); //Moved up to e-enders body = replace(body,".syr",".sir"); body = replace(body,".sly",".slahy"); body = replace(body,".lying\n",".lahying\n"); body = replace(body,".ly",".li"); //sz->siz - The coward's way out. I need to sit down and make this thing more cohesive body = replace(body,"sz\n","siz\n"); body = realReplace("qqq",body,"y\n","ee\n"); body = realReplace("qqq",body,"ehee\n","ehy\n"); body = realReplace("qqq",body,"ahee\n","ahy\n"); body = realReplace("qqq",body,"eee\n","ey\n"); //fixing issues raised by y->ee as compared to other phonetics String[] temp = {"en","st","un","c","f","g","s","t",""}; body = replace(body,"ctable\n","kteybuhl\n"); //save the c's! for(int i = 0; i<temp.length;i++) if(temp.equals("c")) body = replace(body,"kable\n","eybuhl\n"); else body = replace(body,temp+"able\n","eybuhl\n"); body = replace(body,"able\n","uhbuhl\n"); //This one is either "eybuhl" for a few short words or "uhbuhl" for all others body = replace(body,"ble\n","buhl\n"); //The annoying part is the hodge-podgeness of English. The only workable rout may be just to demand phonetic spelling in cases like "Tow" //Necessary --Moved down to make ease-of-use conversions easier body = replace(body,"th","T"); body = replace(body,"sh","S"); //body = replace(body,"ch","C"); //took some liberties here, capitalized the C to make room for the c->k/s conversion body = replace(body,"x","X"); //Consistency - x is really a compound character of ks. body = replace(body,"qu","ku"); //body = replace(body,"q","ku"); /* body = replace(body,"wa","ua"); //Unnecessary? I think not! I'm not sure why, but no. body = replace(body,"we","ue"); body = replace(body,"wi","ui"); body = replace(body,"wo","uo"); body = replace(body,"wu","uu"); */ body = replace(body,"w","u"); //exception catcher if(debug_end_e){ body = replace(body,"e\n","Q\n"); //Just for debugging body = replace(body,".TQ",".Te"); body = replace(body,".bQ",".be"); body = replace(body,".seQ",".seee"); body = replace(body,".mQ",".me"); body = replace(body,"eQ\n","ee\n"); body = replace(body,"Qy\n","ey\n"); body = replace(body,".hQ",".he"); body = replace(body,".shQ",".she"); } return body.substring(1,body.length()-1); //clipping first/last '\n' } private static String replace(String body, String target, String sub){ return realReplace("",body,target,sub); } private static String realReplace(String sofar, String body, String target, String sub) { int target_size = target.length(); int sub_size = sub.length(); //'.'==' ' if(target.startsWith(".")){ body = replace(body,(" "+target.substring(1,target_size)),(" "+sub.substring(1,sub_size))); body = replace(body,("\n"+target.substring(1,target_size)),("\n"+sub.substring(1,sub_size))); /* //re- if(((target_size>=5)&&(!target.substring(1,5).equals("rere")))||(target_size<3)) //clumsy body = replace(body,".re"+target.substring(1,target_size),".ree"+sub.substring(1,target_size)); */ } if(target.endsWith("\n")){ //checks for spaces and for plurals, also does s->z conversion where necessary body = replace(body,(target.substring(0,target_size-1)+" "),(sub.substring(0,sub_size-1)+" ")); //space substitution if(sofar.length()<=2){ //that took longer than it should have. Anyone who can suggest improvements is welcome to try. if((!sofar.contains("z"))&&(!sofar.contains("l"))){ //I think contains() covers it. It saves time over endsWith() if it stops unnecessary calls to realReplace(), as long as it doesn't cut out possible permutations if(!sofar.contains("i"))// s->z if((target_size>=2)&&(target.charAt(target_size-2)!='s')&&(target.charAt(target_size-2)!='z')) //Double-checking s/z if(target.charAt(target_size-2)=='e') if((sub_size>=2)&&(sub.charAt(sub_size-2)=='e')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s\n"),(sub.substring(0,sub_size-1)+"z\n")); else body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s\n"),(sub.substring(0,sub_size-1)+"ez\n")); //s->z else if(((target_size>=2)&&(target.charAt(target_size-2)=='y'))||(target_size<3)) //bug stopper if((sub_size>=2)&&(sub.charAt(sub_size-2)=='e')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-2)+"ies\n"),(sub.substring(0,sub_size-1)+"z\n")); else body = realReplace(sofar+"z",body,(target.substring(0,target_size-2)+"ies\n"),(sub.substring(0,sub_size-1)+"iez\n")); //s->z else body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s\n"),(sub.substring(0,sub_size-1)+"z\n")); //s->z /* //y body = realReplace("qqq",body,"ay\n","ey\n"); //stopgap, might want to revisit body = replace(body,"ey\n","ey\n"); body = realReplace("qqq",body,"oy\n","oi\n"); body = realReplace("qqq",body,"uy\n","ahy\n"); body = realReplace("qqq",body,"y\n","ee\n"); //might need generalized in replace() body = replace(body,"ty","tahy"); */ //ly, focus on y as of 1.7.4.3 - It might need some work if(target.equals("sly\n")) //special case body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); //ly if((target_size>=5)&&(target.substring(target_size-5,target_size-1).equals("able"))) body = realReplace(sofar+"l",body,(target.substring(0,target_size-2)+"y\n"),(sub.substring(0,sub_size-4)+"lee\n")); //ably else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='y')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-2)+"ily\n"),(sub.substring(0,sub_size-2)+"uhlee\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='o')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"pily\n"),(sub.substring(0,sub_size-1)+"uhlee\n")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); //y if((target_size>=2)&&(target.charAt(target_size-2)=='a')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-2)+"ey\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-1)+"y\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='o')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-1)+"i\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='u')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-2)+"ahy\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"py\n"),(sub.substring(0,sub_size-1)+"ee\n")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); if((!sofar.contains("g"))&&(!sofar.contains("i"))&&(!sofar.contains("r"))){ //covers multiple if((!target.endsWith("g\n"))&&(!target.endsWith("gs\n"))&&(!target.endsWith("gz"))) //leave no base uncovered if((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("ie"))) body = realReplace(sofar+"g",body,(target.substring(0,target_size-3)+"ying\n"),(sub.substring(0,sub_size-1)+"ing\n")); //replacing 'ie' before gerund else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-2)+"ing\n"),(sub.substring(0,sub_size-1)+"ing\n")); //removing 'e' else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ping\n"),(sub.substring(0,sub_size-1)+"ing\n")); else if((!target.endsWith("gs\n"))&&(!target.endsWith("gz"))) //no "ing\n" or s\z at end body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ing\n"),(sub.substring(0,sub_size-1)+"ing\n")); //no e, presumably ends in consonant if((!sofar.contains("a"))&&(!sofar.contains("d"))) //ish if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ed")))||(target_size<3)) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"ish\n"),(sub.substring(0,sub_size-1)+"ish\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"pish\n"),(sub.substring(0,sub_size-1)+"ish\n")); if(!sofar.contains("a")) //able if((target_size>=2)&&(target.charAt(target_size-2)=='p')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"pable\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); } else if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); else if(target.equals("fly")||target.equals("unfly")) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); else if(((target_size>=4)&&(target.substring(target_size-4,target_size-1).equals("ing")))||(target_size<4)) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"eybuhl\n")); } if((!sofar.contains("g"))&&(!sofar.contains("d"))){ //covers multiple if(target_size>=2) //d at end if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if(target.charAt(target_size-2)=='e') body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d\n"),(sub.substring(0,sub_size-1)+"ed\n")); //NOT st else if(target.charAt(target_size-2)=='s') body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ed\n"),(sub.substring(0,sub_size-1)+"ed\n")); else if(target.substring(target_size-3,target_size-1).equals("se")) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d\n"),(sub.substring(0,sub_size-1)+"ed\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ped\n"),(sub.substring(0,sub_size-1)+"ed\n")); else if((target.charAt(target_size-2)!='s')||((target.substring(target_size-3,target_size-1).equals("ss")))) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ed\n"),(sub.substring(0,sub_size-1)+"st\n")); //er if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"r\n"),(sub.substring(0,sub_size-1)+"er\n")); //removing 'e' else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"per\n"),(sub.substring(0,sub_size-1)+"er\n")); else body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"er\n"),(sub.substring(0,sub_size-1)+"er\n")); } /* //ate, not bothering with fobiddances - Never mind if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"r\n"),(sub.substring(0,sub_size-1)+"er\n")); //removing 'e' else body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"er\n"),(sub.substring(0,sub_size-1)+"er\n")); */ //Why do these need to be dealt with here? //Because these permuations need to be available to figure out which \n grammars to apply //ed, ish, ly, ing, able, edly, ishly, ably, lying, eding, abling //Dirty method - add a recursion counter to replace() //6 max - ed ish ly ing able z //ablingly, lyingly - 3 //ablinger //s-z, ly-l, ing-g, d-d, ish-i, able-a //everything abides i, nothing abides s/l //nevermind, not much likes i either //a allows l/s/d, //a forbids a, i //d forbids d, i //g forbids d, g, i, a //i forbids s, g, i, a //er-r //r forbids g, i, a //r is forbidden by s, l, g, d //I think that forbiddance is total - no forbidden suffixes at any point before } } } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } return body; } } Edited January 18, 2012 by Kurkistan 1 Link to comment Share on other sites More sharing options...
Zas678 Posted January 18, 2012 Report Share Posted January 18, 2012 *Applause* Link to comment Share on other sites More sharing options...
Turos Posted January 18, 2012 Author Report Share Posted January 18, 2012 *cracks knuckles* This grammar you speak of will submit to my awesome reading skills Link to comment Share on other sites More sharing options...
Wisdom Posted January 18, 2012 Report Share Posted January 18, 2012 Is there a way for the Transliterated Alethi to be Transliterated back? Link to comment Share on other sites More sharing options...
Turos Posted January 18, 2012 Author Report Share Posted January 18, 2012 (edited) Is there a way for the Transliterated Alethi to be Transliterated back? I'm gonna pull a 'no' on this one, as there are multiple letter combinations that convert to the same sound in phonetics. Ex.: 'ks' converts to 'X' in this program. 'x' also converts to 'X'. Would be impossible to differentiate between the two. Their are other cases, but it's still too early in the morning for me to come up with them, haha. It would be possible if Kurkistan made a secondary file that recorded every single conversion and listed the location on the page, but I don't dare ask him to put himself through something like that... and it wouldn't work on text someone typed out in Alethi in the first place. Sorry. ---------------------- @Kurkistan: 'ritual' > 'riCual', you got the 'tual' to be 'Cua', but the 'l' is missing. 'factual' is 'facCua' now. Both missing 'l' and 'c' didn't convert to 'k' like it did before. 'introduction' used to convert to 'introduktion' and now converts to 'introducSuhn'. It seems the 'tion' conversion caused the 'c' conversion from before to cancel. Same thing with 'section' losing the 'c' convert for the 'tion' convert. 'each' used to be 'eaC', now it's 'eak'. 'research' was 'researC', now 'researk'. 'which' used to convert to 'uhiC' and now converts to 'huik'. I like the 'sion' to 'zhun' conversion you added, very nice. It is off for cases like 'passion' but perfect for 'diversion'. I use the 'zh' as well in phonetics, and it's cool to see it used elsewhere! I think maybe if its 'sion', 'zhuhn' is correct, but maybe 'ssion' will always be 'Suhn'. Not a big deal though. 'case,' used to convert to 'kase,', but now to 'seys'. I wouldn't imagine the comma removal causing this glitch, but thought its best to incdicate it was there before. 'however,' used to convert to 'houever,', but now to 'houeever'. Extra 'e'. Not important, but perhaps a clue to the comma removal actually causing issues. 'associated' used to convert to 'asociated', but now to 'asoSeeeyted'. No comma involved this time, weird. Oh, I think I understand. its a combo of See with a long e and eyted with a long a sound. Nevermind ^^ 'accomplishing' was 'aksompliSing', now 'uhkuhmpliSing'. Not a big deal, but it gives a heads up to how changes were made to other conversions. Wierd how 'accessible' converts to 'aksesibuhl' with no problem in both versions. I notice you mentioned attacking 'pp'. Not sure how you meant, but here's one still: 'apparatus'. Ah shoot... I hate english exceptions. I never thought that the 'tion' conversion would screw up words like 'bastion' where 'Suhn' would be improper and 'Cuhn' would be more fitting. I don't think this is really worth tackling, though. Darn you english language!!! Spanish would be so much easier. Anywho, ya the tab problem still happened. I'll post the before and after attachments. Tab happens on the fourth paragraph. If it helps, here's a table of character values. The site lists the "horizontal tab" character as number 9 in the list. Something about ASCII values. http://www.asciitable.com/ Before: test4.txt After: Alethi_test4.txt (I'll get rid of these attachments after you respond to my glitch update.) Edited January 18, 2012 by Turos 1 Link to comment Share on other sites More sharing options...
Kurkistan Posted January 18, 2012 Report Share Posted January 18, 2012 (edited) *Applause* *Bows* I'm gonna pull a 'no' on this one, as there are multiple letter combinations that convert to the same sound in phonetics. Ex.: 'ks' converts to 'X' in this program. 'x' also converts to 'X'. Would be impossible to differentiate between the two. Their are other cases, but it's still too early in the morning for me to come up with them, haha. It would be possible if Kurkistan made a secondary file that recorded every single conversion and listed the location on the page, but I don't dare ask him to put himself through something like that... and it wouldn't work on text someone typed out in Alethi in the first place. Sorry. Second that. If you want an Latin/Roman transliteration of an Alethi file, then your best bet is just to hope that the file was originally typed in English, and acquire that source file. It might be possible to transliterate raw Alethi, despite the fact that our Alethi alphabet has fewer characters: someone might be able to reverse-engineer my program or create one of their own from scratch to transliterate it into English. The problem is that the spelling of many English words is phonetically arbitrary, so you won't be able to get proper spelling unless you put in a ludicrous amount of work, and maybe not even then. ---------------------- @Kurkistan: 'ritual' > 'riCual', you got the 'tual' to be 'Cua', but the 'l' is missing. 'factual' is 'facCua' now. Both missing 'l' and 'c' didn't convert to 'k' like it did before. 'introduction' used to convert to 'introduktion' and now converts to 'introducSuhn'. It seems the 'tion' conversion caused the 'c' conversion from before to cancel. Same thing with 'section' losing the 'c' convert for the 'tion' convert. 'each' used to be 'eaC', now it's 'eak'. 'research' was 'researC', now 'researk'. 'which' used to convert to 'uhiC' and now converts to 'huik'. I like the 'sion' to 'zhun' conversion you added, very nice. It is off for cases like 'passion' but perfect for 'diversion'. I use the 'zh' as well in phonetics, and it's cool to see it used elsewhere! I think maybe if its 'sion', 'zhuhn' is correct, but maybe 'ssion' will always be 'Suhn'. Not a big deal though. 'case,' used to convert to 'kase,', but now to 'seys'. I wouldn't imagine the comma removal causing this glitch, but thought its best to incdicate it was there before. 'however,' used to convert to 'houever,', but now to 'houeever'. Extra 'e'. Not important, but perhaps a clue to the comma removal actually causing issues. 'associated' used to convert to 'asociated', but now to 'asoSeeeyted'. No comma involved this time, weird. Oh, I think I understand. its a combo of See with a long e and eyted with a long a sound. Nevermind ^^ 'accomplishing' was 'aksompliSing', now 'uhkuhmpliSing'. Not a big deal, but it gives a heads up to how changes were made to other conversions. Wierd how 'accessible' converts to 'aksesibuhl' with no problem in both versions. I notice you mentioned attacking 'pp'. Not sure how you meant, but here's one still: 'apparatus'. Ah shoot... I hate english exceptions. I never thought that the 'tion' conversion would screw up words like 'bastion' where 'Suhn' would be improper and 'Cuhn' would be more fitting. I don't think this is really worth tackling, though. Darn you english language!!! Spanish would be so much easier. Anywho, ya the tab problem still happened. I'll post the before and after attachments. Tab happens on the fourth paragraph. If it helps, here's a table of character values. The site lists the "horizontal tab" character as number 9 in the list. Something about ASCII values. http://www.asciitable.com/ Before: test4.txt After: Alethi_test4.txt (I'll get rid of these attachments after you respond to my glitch update.) I'm away from my computer and working files for a few more hours, but these all look like relatively simple issues, not the death-log of your last test. Most of them are just stupid mistakes I made, like forgetting the second '\n' when going from "tual\n"->"Cual\n." I'll take a look at that tab problem as well, although that's almost certainly just an itsy bitsy coding issue, not a implication-ridden grammatical error. EDIT: Found it in the spoiled code already. I need to move removeCharacter() higher up in the function: There was a colon just before that tab, and that's what the period was moving to. I agree wholeheartedly that Spanish would be easier. English spelling is what happens when you take one alphabet, use it to generate choose-your-own-adventure spelling for two completely different families of languages, and then mash those languages back together again, stealing from a few others along the way. Edited January 18, 2012 by Kurkistan Link to comment Share on other sites More sharing options...
Turos Posted January 18, 2012 Author Report Share Posted January 18, 2012 Cheers for mashing! Link to comment Share on other sites More sharing options...
Kurkistan Posted January 19, 2012 Report Share Posted January 19, 2012 Dealt with Turos' bugs, added rules for suffixes which add a 't' onto the end of words, added a few more "pp" rules, although there might be a few more floating around. Specifically, a few of the bugs that Turos pointed out were actually intentional on my part based upon Dictionary.com phonetics: I meant which->huiC. I messed up with case->seys, but it should have been case->keys all along. Many of the cases of .a->.uh are actually intentional, although it varies by word, and so is still worth double checking. /** * Goal: Provide an easy means of transliterating Roman letters into Alethi script using Turos's font conventions. * * * @author Kurkistan, with significant developmental input from Turos * @date 01/18/2012 * @version 1.8.6 */ import java.io.FileReader; import java.io.FileWriter; import java.io.BufferedWriter; import java.io.InputStreamReader; import java.io.File; import java.io.PrintWriter; import java.io.IOException; import java.util.Scanner; import java.io.BufferedReader; import java.util.Arrays; public class AlethiTransliterator_1_8_6 { static boolean debug_char = false; static boolean debug_end_e = false; public static void main (String[] arg) throws IOException{ Scanner input=new Scanner(System.in); System.out.print("Enter input file (full name of file in same directory): "); String temp = input.next(); //temp = "Test.txt"; final double startTime = System.currentTimeMillis(); final double endTime; try { String alethi = convertText(temp); if(alethi.equals("&")) return; temp = "Alethi_"+temp; writeFile(alethi,temp); if(debug_char){ String violations = allowedCharacters(alethi); //debugging blatant errors if(!violations.equals("")) System.out.println("Unauthorized sections in text (Line:Violation):"+"\n"+violations); } } finally { endTime = System.currentTimeMillis(); } final double duration = endTime - startTime; System.out.println("Execution time: "+(duration/1000)+" seconds"); } private static String convertText(String roman) throws IOException { roman = readFile(roman); //text file if(!debug_char) roman = removeCharacters(roman); if((roman.length()==1)&&(roman.charAt(0)=='&')) //invalid input, halt program return "&"; char[] body = roman.toCharArray(); periodMover(body); roman = new String(body); String alethi = replaceLetters(roman); return alethi; } /** * Load a text file contents as a <code>String<code>. * * @param file The input file * @return The file contents as a <code>String</code> * @exception IOException IO Error */ private static String readFile(String file) throws IOException { String whole = ""; try { BufferedReader in = new BufferedReader(new FileReader(file)); String str; while ((str = in.readLine()) != null) { whole = whole + str + '\n'; //process(str); } in.close(); } catch (IOException e) { System.out.println("File not in directory or misspelled."); return "&"; } whole="\n"+whole.toLowerCase(); //convert to lower - keeping an extra \n at the end and beginning for replacement ease of use, will get rid of it return whole; } private static void writeFile(String text, String destination) throws IOException { File file = new File(destination); boolean exist = file.createNewFile(); if (!exist) { System.out.println("Output file already exists."); System.exit(0); } else { FileWriter fstream = new FileWriter(destination); BufferedWriter out = new BufferedWriter(fstream); out.write(text); out.close(); System.out.println("File created successfully."); } } private static String allowedCharacters(String body) { //c, q, w, x, th, sh, ch - Forbidden; I assume no lowercases of the special characters (C, X) //\n, ' ', '.', C, S/s, T/t, X, - Allowed char[] library = new char[29]; String[] pairs = {"th","sh","ch"}; //These shouldn't trigger unless I made a serious mistake in the "necessary" section. char[] body_array = body.toCharArray(); String violations = ""; int line = 1; //for all of those +1ers out there int target_size = 2; int search = body.length() - target_size; for(int j = 0;j<pairs.length;j++) for(int i = 0; i<=search;i++) if(body_array=='\n') line++; else if(body.substring(i,i+target_size).equals(pairs[j])) violations = violations + (line+":"+pairs[j]) + "; "; library[0] = '\n'; library[1] = ' '; library[2] = '.'; library[3] = 'C'; library[4] = 'S'; library[5] = 'T'; library[6] = 'X'; int place = 7; for(int i = 97; i <=122; i++){ if((i!=99)&&(i!=113)&&(i!=119)&&(i!=120)){ //c, q, w, and x library[place] = (char)i; place++; } } line = 1; //resetting for(int i = 0;i<body.length();i++) if(body_array=='\n') line++; else if(Arrays.binarySearch(library,body_array)<0) //not in library violations = violations + (line+":"+body_array) + "; "; return violations; } private static String removeCharacters(String body) { char[] library = new char[56]; library[0] = '\t'; //tab library[1] = '\n'; library[2] = ' '; library[3] = '.'; int place = 4; for(int i = 65; i <=90; i++){ library[place] = (char)i; place++; } for(int i = 97; i <=122; i++){ library[place] = (char)i; place++; } for(int i = 0; i < body.length(); i++) if(Arrays.binarySearch(library,body.charAt(i))<0){ //I felt embarrassed by my earlier search algorithm. body = body.substring(0,i)+body.substring(i+1,body.length()); i--; } return body; } /** * In the Alethi alphabet, sentences start with a period '.' and don't end with anything. */ private static void periodMover(char[] array) { int temp = 0; for(int i=0;i<array.length;i++) { if(array=='.'){ if(!(((array.length - i) >= 3)&&(array==array[i+1])&&(array[i+1]==array[i+2]))) //ellipsis { twistRight(array,temp,i); i++; while(i<array.length) if(!inAlphabet(array)) i++; else break; //Yes, the cardinal sin. temp=i; } else if(((array.length-i)>=3)&&(array==array[i+1])&&(array[i+1]==array[i+2])) { for(int j=0;j<3;j++) twistRight(array,temp+j,i+j); i+=3; while(i<array.length) if(!inAlphabet(array)) i++; else break; //Yes, the cardinal sin. temp=i; } } else if(array=='\n') temp=i+1; //Doesn't allow sentences to continue after true line breaks. Enables no-period headers and whatnot. } } private static boolean inAlphabet(char character) { char[] library = new char[26]; int place = 0; for(int i = 97; i <=122; i++){ library[place] = (char)i; place++; } if(Arrays.binarySearch(library,character)>=0) //I felt embarrassed by my earlier search algorithm. return true; return false; } private static void twistRight(char[] array, int start, int end) { if (start==end) return; char a = array[start]; char b; array[start] = array[end]; //'.', although this is generalized while(start!=end) { start++; b = array[start]; array[start] = a; a = b; } } public static void test() { String body = "\nsnapping snapper snappily snappy snaps snap snapped snappable snappably\n"; //snapping snapper snappily snappy snaps snap snapped snappable snappably. String target = "ap\n"; String sub = "op\n"; System.out.println(replace(body,target,sub)); int target_size = target.length(); int sub_size = sub.length(); String sofar = ""; if((target_size>=2)&&(target.charAt(target_size-2)=='p')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"pable\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } System.out.println(body); } /** * Special charaters: For t, use lower case t. For th, use capital T. For s, use lower case s. For sh, use capital S. For ch, use c. X will print a combination of k and s. For q and w, use your imagination. Technically speaking, q is a combination of k and u. W is basically a combination of a long u ("oo") and any other vowel: a e i o and short u ("uh") */ private static String replaceLetters(String body) { //Ease of use //1.3.5-Threw in an If statement in the replace function to deal with space and \n at the same time //ph body = replace(body,"ph","f"); //anti- body = replace(body,".anti",".antahy"); //wh body = replace(body,"who\n","hoo\n"); body = replace(body,"where","huair"); //changed w to u body = replace(body,"whir","huur"); body = replace(body,"wh","hu"); //Might need more permutations body = replace(body,".accr",".uhkr"); //many many many body = replace(body,".acci",".aksi"); body = replace(body,".accord",".uhkawrd"); body = replace(body,".accomp",".uhkuhmp"); body = replace(body,".acco",".uhko"); body = replace(body,".accustom\n",".uhkuhstuhm\n"); body = replace(body,".accolade\n",".akuhleyd\n"); body = replace(body,".accus",".uhkyooz"); body = replace(body,".accurs",".uhkurs"); body = replace(body,".accur",".akyer"); body = replace(body,".accum",".uhkyoom"); body = replace(body,".accout",".uhkoot"); body = replace(body,".accoun",".uhkount"); body = replace(body,".acce",".akse"); //the dreaded double c's body = replace(body,".ecc",".eks"); body = replace(body,"ucca","uhka"); body = replace(body,"ucco","uhko"); body = replace(body,"uccu","uhku"); body = replace(body,".occ",".uhk"); body = replace(body,"ucce","uhkse"); body = replace(body,"ucci","uhksi"); body = replace(body,"occup","okyuh"); //very special case body = replace(body,"occa","uhkah"); body = replace(body,"occi","oksi"); body = replace(body,"occe","ochee"); //? body = replace(body,"occo","okuh"); body = replace(body,"occu","okuh"); //Just went down the list on http://www.morewords.com/contains/cc - Useful, if laborious //E at end - Some interference possible with C's body = replace(body,"use\n","yooz\n"); body = replace(body,"used\n","yoozd\n"); //special case //Note: Need to make sure that plurals of e-enders are covered, i.e. wives. body = replace(body,"like\n","lahyk\n"); body = replace(body,"ole\n","ohl\n"); //hyperbole will suffer body = replace(body,"ose\n","ohz\n"); body = replace(body,"ame\n","eym\n"); body = replace(body,"ese\n","eez\n"); body = replace(body,"have\n","hav\n"); body = replace(body,"ave\n","eyv\n"); body = replace(body,"eive\n","eev\n"); body = replace(body,"vive\n","vahyv\n"); body = replace(body,"ive\n","iv\n"); //body = replace(body,"ever\n","ever\n"); body = replace(body,"eve\n","eev\n"); //HOWEVER body = replace(body,"eever\n","ever\n"); body = replace(body,"ile\n","ahyl\n"); //System.out.println(replace(replace("while ","wh","hu"),"ile\n","ahyl\n")); //huahyl body = replace(body,"gle\n","guhl\n"); body = replace(body,"base\n","beys\n"); //And now the ends-with function on scrabblefinder.com was useful body = replace(body,"case\n","keys\n"); body = replace(body,"chase\n","Ceys\n"); //ch == C body = replace(body,"Case\n","Ceys\n"); //necessary? body = replace(body,"erase\n","ihreys\n"); body = replace(body,"ase\n","eez\n"); body = replace(body,"olve\n","olv\n"); body = replace(body,"alve\n","ahv\n"); body = replace(body,"elve\n","elv\n"); body = replace(body,"some\n","suhm\n"); body = replace(body,"come\n","cuhm\n"); //Need to move this up body = replace(body,"ome\n","ohm\n"); body = replace(body,"tle\n","l\n"); //This is what dictionary.com said to do, and I live to serve body = replace(body,".discipline\n",".disipline\n"); body = replace(body,"ine\n","ahyn\n"); body = replace(body,".one\n",".uuhn\n"); body = replace(body,"done\n","duhn\n"); body = replace(body,"none\n","nuhn\n"); body = replace(body,"one\n","ohn\n"); body = replace(body,"ake\n","eyk\n"); body = replace(body,"ope\n","ohp\n"); body = replace(body,"rue\n","roo\n"); body = replace(body,"ife\n","ahyf\n"); body = replace(body,"bead\n","beed\n"); body = replace(body,".read\n",".reed\n"); body = replace(body,"nead\n","need\n"); body = replace(body,"lead\n","leed\n"); body = replace(body,"ead\n","ed\n"); //general body = replace(body,"ade\n","eyd\n"); //ere - their vs there body = replace(body,"ere\n","eir\n"); //ore, as in fore, bore body = replace(body,"ore","ohr"); body = replace(body,".are\n",".ahr\n"); body = replace(body,"are\n","air\n"); body = replace(body,"oke\n","ohk\n"); body = replace(body,"tire","tahyuhr"); //NOT \n or e body = replace(body,"aire\n","air\n"); body = replace(body,"ire\n","yuhr\n"); //? body = replace(body,"ype\n","ahyp\n"); body = replace(body,"urge\n","urj\n"); body = replace(body,"erge\n","urj\n"); //Not a mistake body = replace(body,"arge\n","hrj\n"); body = replace(body,"orge\n","wrj\n"); body = replace(body,"ime\n","ahym\n"); body = replace(body,"sle\n","ahyl\n"); body = replace(body,"promise\n","promis\n"); body = replace(body,"aise\n","eyz\n"); body = replace(body,"ise\n","ahyz\n"); body = replace(body,"lse\n","ls\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"sce\n","es\n"); body = replace(body,"que\n","k\n"); body = replace(body,"udge\n","uhj\n"); body = replace(body,"dge\n","j\n"); //NOT sure body = replace(body,"age\n","aij\n"); //gue - This one was irritating, might not be right body = replace(body,"logue\n","awg\n"); body = replace(body,"gogue\n","awg\n"); body = replace(body,".morgue\n",".mawrg\n"); body = replace(body,".fugue\n",".fyoog\n"); body = replace(body,".segue\n",".segwey\n"); body = replace(body,"rgue\n","rgyoo\n"); body = replace(body,"gue\n","eeg\n"); //-nge body = replace(body,"nge\n","nj\n"); //problem with sing vs singe not really being separable at the gerund-testing level body = replace(body,"sinjing\n","singing\n"); //comprehensive fix for gerund mishaps body = replace(body,"slinjing\n","slinging\n"); body = replace(body,"strinjing\n","stringing\n"); body = replace(body,"swinjing\n","swinging\n"); body = replace(body,"brinjing\n","bringing\n"); body = replace(body,"flinjing\n","flinging\n"); body = replace(body,"prinjing\n","pringing\n"); body = replace(body,".winjing\n",".winging\n"); body = replace(body,".zinjing\n",".zinging\n"); body = replace(body,".dinjing\n",".dinging\n"); body = replace(body,".pinjing\n",".pinging\n"); //END E's //s at end - 1.7.4.5 -> unneeded, I think //body = replace(body,"es\n","ez\n"); //Needs to go before c->s conversion, since C's are all soft S's //This is a big thing. I moved the c down mainly to allow for the s->z convertor to do it's job, and the judgement on whether or not this messes things up is pending. //START C 1.7 - moved so that higher number of characters in target get's preference, blocks kept cohesive //Stolen from the "necessary" bin. body = replace(body,"ch","C"); //Although both versions of C work, I'm assuming capitalized, so no lowercas c's are allowed in the text body = replace(body,"accent","aksent"); body = replace(body,"exercise\n","eksersahyz\n"); body = replace(body,".once",".wuhns"); body = replace(body,"preface\n","prefis\n"); //special body = replace(body,"icise\n","uhsahyz\n"); body = replace(body,"rcise\n","ruhsahyz\n"); body = replace(body,".tacit\n",".tasit\n"); body = replace(body,"ciate\n","sheeeyt\n"); body = replace(body,"vate\n","vit\n"); //pulled from E section, might be a sign of things to come body = replace(body,"literate\n","literit\n"); body = replace(body,"ate\n","eyt\n"); body = replace(body,"cision\n","sizhuhn\n"); body = replace(body,"cise\n","sahys\n"); body = replace(body,"cist\n","sist"); body = replace(body,"uce\n","us\n"); body = replace(body,"uces\n","usez\n"); //z incorporated body = replace(body,"uced\n","usst\n"); //D's body = replace(body,"came\n","keym\n"); body = replace(body,"came","kamuh"); body = replace(body,"ct","kt"); //factual body = replace(body,"tual\n","Cual\n"); body = replace(body,".acid\n",".asid\n"); body = replace(body,".aci",".uhsi"); body = replace(body,".key\n",".kee\n"); //special body = realReplace("QQQ",body,".keys\n",".kees\n"); body = replace(body,"ierce\n","eers\n"); body = replace(body,"ince\n","ins\n"); //body = replace(body,".ance",".ahns"); body = replace(body,".trance",".trahns"); body = replace(body,"dance\n","dahns\n"); body = replace(body,"Cance","Cahns"); body = replace(body,"cance","cahns"); body = replace(body,"lance","lahns"); body = replace(body,"vance","vahns"); body = replace(body,"ance\n","uhns\n"); body = replace(body,"all\n","awl\n"); body = replace(body,"appa","apuh"); body = replace(body,"ppen","pen"); //double p's, might NOT be done body = replace(body,"pple\n","puhl\n"); body = replace(body,"ppl","puhl"); body = replace(body,"upp\n","uhp"); body = replace(body,"oppor","oper"); body = replace(body,"opp","uhp"); body = replace(body,"ypp","ip"); body = replace(body,"pp","p"); //Last ditch, should cover most before this body = replace(body,"tice\n","tis\n"); body = replace(body,"arice\n","eris\n"); body = replace(body,"orice\n","uhis\n"); body = replace(body,"cipice\n","suhpis\n"); //patch for precipice body = replace(body,"ipice\n","uhpis\n"); body = replace(body,".vice\n","vahys\n"); body = replace(body,"vice\n","vis\n"); body = replace(body,"ice\n","ahys\n"); //Long S. NOT sure about \n's body = replace(body,"egy\n","ijee\n"); //possibilities/strategies fix, I have now idea how the ended up "kiez" body = replace(body,"ity\n","itee\n"); body = replace(body,"ite\n","ahyt\n"); body = replace(body,"ong\n","ong\n"); body = replace(body,"ull\n","ool\n"); body = replace(body,"cide\n","sahyd\n"); body = replace(body,"ide\n","ahyd\n"); body = replace(body,"ence\n","ens\n"); body = replace(body,"ces\n","seez\n"); body = replace(body,"cez\n","seez\n"); //Incase of S->Z body = replace(body,"ce\n","s\n"); body = replace(body,"ci\n","sahy\n"); body = replace(body,"oy\n","oi\n"); body = replace(body,"ace\n","eys\n"); body = replace(body,".chull\n",".as\n"); body = replace(body,".chull",".uhs"); //Assoc- body = replace(body,"ely\n","lee\n"); //MUST BE LAST IN \N body = replace(body,".scie",".sahye"); //For Science! body = replace(body,"sciou","shuh"); //For Conscience! body = replace(body,"cious","shuhs"); //For Ithaca! body = replace(body,"scio","shuh"); body = replace(body,"scie","shuh"); body = replace(body,"ply\n","plahy\n"); body = replace(body,".by\n",".bahy\n"); body = replace(body,".my\n",".mahy\n"); body = replace(body,".die\n",".dahy\n"); body = replace(body,".dye\n",".dahy\n"); body = replace(body,".bye\n",".bahy\n"); //conflict body = replace(body,"hype","hahype"); body = replace(body,"hypo","hahypo"); body = replace(body,"hypn","hipn"); body = replace(body,"hyphen","hahyfuhn"); body = replace(body,"hyfen","hahyfuhn"); //ph->f body = replace(body,"yp","ip"); body = replace(body,"duct","duhkt"); body = replace(body,"tion","Suhn"); //1.8 body = replace(body,"ssion","Suhn"); //1.8.6 body = replace(body,"sion","zhuhn"); body = replace(body,"cean","Suhn"); body = replace(body,"ture","Cur"); body = replace(body,"cies","seez"); //prophocies body = replace(body,"ciez","seez"); //s->z already done body = replace(body,"iew","yoo"); body = replace(body,".face",".feys"); body = replace(body,"face","feys"); body = replace(body,"acen","eysuhn"); //Don't get complacent body = replace(body,"ician","ishuhn"); //musician body = replace(body,"cism","sizuhm"); //anglicanism body = replace(body,"cial","shul"); body = replace(body,".acq",".akw"); //might need refinement body = replace(body,"cque","ke"); body = replace(body,"acquaint","uhkweynt"); body = replace(body,"cing","sing"); //1.6.5 - odyssey test body = replace(body,"exce","ikse"); body = replace(body,"excit","iksahyt"); body = replace(body,"excis","eksahyz"); body = replace(body,"ici","isi"); //Sicily body = replace(body,"iec","ees"); //Piece/Peace -> Pees body = replace(body,"eac","ees"); body = replace(body,"ight","ahyt"); body = replace(body,"cep","sep"); body = replace(body,"cin","sin"); body = replace(body,".cit",".sit"); body = replace(body,"cip","sip"); body = replace(body,"cif","sif"); //NOT sure body = replace(body,"icc","ik"); body = replace(body,"icn","ikn"); body = replace(body,"sce","se"); body = replace(body,"sci","si"); body = replace(body,"scy","sahy"); //body = replace(body,"sco","sko"); body = replace(body,"cea","sea"); body = replace(body,"nci","nsi"); //might need refinement body = replace(body,"ncy","nsee"); body = replace(body,"cei","see"); body = replace(body,"cee","see"); body = replace(body,"cent","sent"); //odyssey body = replace(body,"it\n","it\n"); //Tacked on for suffix reasons body = replace(body,"ap\n","ap\n"); //starting with c body = replace(body,".cy",".sahy"); body = replace(body,".cir",".sur"); body = replace(body,".cid",".sahyd"); body = replace(body,".ci",".si"); body = replace(body,".cer",".sur"); body = replace(body,".ce",".se"); body = replace(body,"ck","k"); /* body = realReplace("QQQ",body,"C\n","k\n"); body = realReplace("QQQ",body,"ch\n","k\n"); */ body = replace(body,"sc","sk"); body = replace(body,"cy","see"); //1.4.3 - si->see body = replace(body,"ce","se"); body = replace(body,"ca","ka"); body = replace(body,"co","ko"); body = replace(body,"cu","ku"); body = replace(body,"ct","kt"); body = replace(body,"cl","kl"); body = replace(body,"cr","kr"); body = realReplace("QQQ",body,".c",".k"); //This can possibly leave lowercase c's in the text, although I think that all properly spelled words should be covered here. body = realReplace("QQQ",body,"c\n","k\n"); //to stop mischeif //END C'S //Not sure where to put this section //ss body = replace(body,"ss","s"); body = replace(body,".be\n",".bee\n"); body = replace(body,".maybe\n",".meybee\n"); //gh body = replace(body,"gha","gah"); //This section needs work body = replace(body,"gho","goh"); body = replace(body,"ought","awt"); body = replace(body,"though","thoh"); body = replace(body,"bough","bou"); body = replace(body,"cough","kof"); body = replace(body,"igh","ahy"); body = replace(body,"gh\n","\n"); body = replace(body,"gh","g"); //to, too, two - Just a quick patch for those three words, not a general solution to any problem I can see body = replace(body,".to\n",".too\n"); body = replace(body,".two\n",".too\n"); //q at end body = realReplace("QQQ",body,"q\n","k\n"); //w at end body = replace(body,".low\n",".loh\n");//special cases body = replace(body,".row\n",".roh\n"); body = replace(body,"ow\n","au\n"); //.sy body = replace(body,".syr",".suhr"); //Moved up to e-enders body = replace(body,".syr",".sir"); body = replace(body,".sly",".slahy"); body = replace(body,".lying\n",".lahying\n"); body = replace(body,".ly",".li"); //sz->siz - The coward's way out. I need to sit down and make this thing more cohesive body = replace(body,"sz\n","siz\n"); body = realReplace("qqq",body,"y\n","ee\n"); body = realReplace("qqq",body,"ehee\n","ehy\n"); body = realReplace("qqq",body,"ahee\n","ahy\n"); body = realReplace("qqq",body,"eee\n","ey\n"); //fixing issues raised by y->ee as compared to other phonetics String[] temp = {"en","st","un","c","f","g","s","t",""}; body = replace(body,"ctable\n","kteybuhl\n"); //save the c's! for(int i = 0; i<temp.length;i++) if(temp.equals("c")) body = replace(body,"kable\n","eybuhl\n"); else body = replace(body,temp+"able\n","eybuhl\n"); body = replace(body,"able\n","uhbuhl\n"); //This one is either "eybuhl" for a few short words or "uhbuhl" for all others body = replace(body,"ble\n","buhl\n"); //The annoying part is the hodge-podgeness of English. The only workable rout may be just to demand phonetic spelling in cases like "Tow" //Necessary --Moved down to make ease-of-use conversions easier body = replace(body,"th","T"); body = replace(body,"sh","S"); //body = replace(body,"ch","C"); //took some liberties here, capitalized the C to make room for the c->k/s conversion body = replace(body,"x","X"); //Consistency - x is really a compound character of ks. body = replace(body,"qu","ku"); //body = replace(body,"q","ku"); /* body = replace(body,"wa","ua"); //Unnecessary? I think not! I'm not sure why, but no. body = replace(body,"we","ue"); body = replace(body,"wi","ui"); body = replace(body,"wo","uo"); body = replace(body,"wu","uu"); */ body = replace(body,"w","u"); //exception catcher if(debug_end_e){ body = replace(body,"e\n","Q\n"); //Just for debugging body = replace(body,".TQ",".Te"); body = replace(body,".bQ",".be"); body = replace(body,".seQ",".seee"); body = replace(body,".mQ",".me"); body = replace(body,"eQ\n","ee\n"); body = replace(body,"Qy\n","ey\n"); body = replace(body,".hQ",".he"); body = replace(body,".shQ",".she"); } return body.substring(1,body.length()-1); //clipping first/last '\n' } private static String replace(String body, String target, String sub){ return realReplace("",body,target,sub); } private static String realReplace(String sofar, String body, String target, String sub) { int target_size = target.length(); int sub_size = sub.length(); //'.'==' ' if(target.startsWith(".")){ body = replace(body,(" "+target.substring(1,target_size)),(" "+sub.substring(1,sub_size))); body = replace(body,("\n"+target.substring(1,target_size)),("\n"+sub.substring(1,sub_size))); /* //re- if(((target_size>=5)&&(!target.substring(1,5).equals("rere")))||(target_size<3)) //clumsy body = replace(body,".re"+target.substring(1,target_size),".ree"+sub.substring(1,target_size)); */ } if(target.endsWith("\n")){ //checks for spaces and for plurals, also does s->z conversion where necessary body = replace(body,(target.substring(0,target_size-1)+" "),(sub.substring(0,sub_size-1)+" ")); //space substitution if(sofar.length()<=2){ //that took longer than it should have. Anyone who can suggest improvements is welcome to try. if((!sofar.contains("z"))&&(!sofar.contains("l"))){ //I think contains() covers it. It saves time over endsWith() if it stops unnecessary calls to realReplace(), as long as it doesn't cut out possible permutations if(!sofar.contains("i"))// s->z if((target_size>=2)&&(target.charAt(target_size-2)!='s')&&(target.charAt(target_size-2)!='z')) //Double-checking s/z if(target.charAt(target_size-2)=='e') if((sub_size>=2)&&(sub.charAt(sub_size-2)=='e')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s\n"),(sub.substring(0,sub_size-1)+"z\n")); else body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s\n"),(sub.substring(0,sub_size-1)+"ez\n")); //s->z else if(((target_size>=2)&&(target.charAt(target_size-2)=='y'))||(target_size<3)) //bug stopper if((sub_size>=2)&&(sub.charAt(sub_size-2)=='e')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-2)+"ies\n"),(sub.substring(0,sub_size-1)+"z\n")); else body = realReplace(sofar+"z",body,(target.substring(0,target_size-2)+"ies\n"),(sub.substring(0,sub_size-1)+"iez\n")); //s->z else body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s\n"),(sub.substring(0,sub_size-1)+"z\n")); //s->z /* //y body = realReplace("qqq",body,"ay\n","ey\n"); //stopgap, might want to revisit body = replace(body,"ey\n","ey\n"); body = realReplace("qqq",body,"oy\n","oi\n"); body = realReplace("qqq",body,"uy\n","ahy\n"); body = realReplace("qqq",body,"y\n","ee\n"); //might need generalized in replace() body = replace(body,"ty","tahy"); */ //ly, focus on y as of 1.7.4.3 - It might need some work if(target.equals("sly\n")) //special case body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); //ly if((target_size>=5)&&(target.substring(target_size-5,target_size-1).equals("able"))) body = realReplace(sofar+"l",body,(target.substring(0,target_size-2)+"y\n"),(sub.substring(0,sub_size-4)+"lee\n")); //ably else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='y')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-2)+"ily\n"),(sub.substring(0,sub_size-2)+"uhlee\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='o')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"pily\n"),(sub.substring(0,sub_size-1)+"uhlee\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"tily\n"),(sub.substring(0,sub_size-1)+"uhlee\n")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); //y if((target_size>=2)&&(target.charAt(target_size-2)=='a')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-2)+"ey\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-1)+"y\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='o')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-1)+"i\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='u')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"y\n"),(sub.substring(0,sub_size-2)+"ahy\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"py\n"),(sub.substring(0,sub_size-1)+"ee\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ty\n"),(sub.substring(0,sub_size-1)+"ee\n")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly\n"),(sub.substring(0,sub_size-1)+"lee\n")); if((!sofar.contains("g"))&&(!sofar.contains("i"))&&(!sofar.contains("r"))){ //covers multiple if((!target.endsWith("g\n"))&&(!target.endsWith("gs\n"))&&(!target.endsWith("gz"))) //leave no base uncovered if((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("ie"))) body = realReplace(sofar+"g",body,(target.substring(0,target_size-3)+"ying\n"),(sub.substring(0,sub_size-1)+"ing\n")); //replacing 'ie' before gerund else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-2)+"ing\n"),(sub.substring(0,sub_size-1)+"ing\n")); //removing 'e' else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ping\n"),(sub.substring(0,sub_size-1)+"ing\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ting\n"),(sub.substring(0,sub_size-1)+"ing\n")); else if((!target.endsWith("gs\n"))&&(!target.endsWith("gz"))) //no "ing\n" or s\z at end body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ing\n"),(sub.substring(0,sub_size-1)+"ing\n")); //no e, presumably ends in consonant if((!sofar.contains("a"))&&(!sofar.contains("d"))) //ish if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"pish\n"),(sub.substring(0,sub_size-1)+"ish\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"tish\n"),(sub.substring(0,sub_size-1)+"ish\n")); else if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ed")))||(target_size<3)) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"ish\n"),(sub.substring(0,sub_size-1)+"ish\n")); if(!sofar.contains("a")) //able if((target_size>=2)&&(target.charAt(target_size-2)=='p')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"pable\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); } else if((target_size>=2)&&(target.charAt(target_size-2)=='t')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"table\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); } else if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); else if(target.equals("fly")||target.equals("unfly")) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); else if(((target_size>=4)&&(target.substring(target_size-4,target_size-1).equals("ing")))||(target_size<4)) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"eybuhl\n")); } if((!sofar.contains("g"))&&(!sofar.contains("d"))){ //covers multiple if(target_size>=2) //d at end if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if(target.charAt(target_size-2)=='e') body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d\n"),(sub.substring(0,sub_size-1)+"ed\n")); //NOT st else if(target.charAt(target_size-2)=='s') body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ed\n"),(sub.substring(0,sub_size-1)+"ed\n")); else if(target.substring(target_size-3,target_size-1).equals("se")) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d\n"),(sub.substring(0,sub_size-1)+"ed\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ped\n"),(sub.substring(0,sub_size-1)+"ed\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ted\n"),(sub.substring(0,sub_size-1)+"ed\n")); else if((target.charAt(target_size-2)!='s')||((target.substring(target_size-3,target_size-1).equals("ss")))) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ed\n"),(sub.substring(0,sub_size-1)+"st\n")); //er if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"r\n"),(sub.substring(0,sub_size-1)+"er\n")); //removing 'e' else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"per\n"),(sub.substring(0,sub_size-1)+"er\n")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"ter\n"),(sub.substring(0,sub_size-1)+"er\n")); else body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"er\n"),(sub.substring(0,sub_size-1)+"er\n")); } /* //ate, not bothering with fobiddances - Never mind if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"r\n"),(sub.substring(0,sub_size-1)+"er\n")); //removing 'e' else body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"er\n"),(sub.substring(0,sub_size-1)+"er\n")); */ //Why do these need to be dealt with here? //Because these permuations need to be available to figure out which \n grammars to apply //ed, ish, ly, ing, able, edly, ishly, ably, lying, eding, abling //Dirty method - add a recursion counter to replace() //6 max - ed ish ly ing able z //ablingly, lyingly - 3 //ablinger //s-z, ly-l, ing-g, d-d, ish-i, able-a //everything abides i, nothing abides s/l //nevermind, not much likes i either //a allows l/s/d, //a forbids a, i //d forbids d, i //g forbids d, g, i, a //i forbids s, g, i, a //er-r //r forbids g, i, a //r is forbidden by s, l, g, d //I think that forbiddance is total - no forbidden suffixes at any point before } } } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } return body; } } 1 Link to comment Share on other sites More sharing options...
Turos Posted January 19, 2012 Author Report Share Posted January 19, 2012 Many of the cases of .a->.uh are actually intentional, although it varies by word, and so is still worth double checking. Ah, makes sense. Awesome! I'll check another article over tomorrow and watch in awe at how shiny it looks after conversion Link to comment Share on other sites More sharing options...
Kurkistan Posted January 19, 2012 Report Share Posted January 19, 2012 (edited) Ah, makes sense. Awesome! I'll check another article over tomorrow and watch in awe at how shiny it looks after conversion I warn you, it's gotten slightly longer to convert things. The Odyssey was bumped up from 8 minutes to 2 hours, 16 minutes . EDIT: That's a %1600 increase, for all you folks at home. Edited January 19, 2012 by Kurkistan Link to comment Share on other sites More sharing options...
dhalagirl Posted January 19, 2012 Report Share Posted January 19, 2012 Turos and Kurkistan -- Firstly, you guys are awesome. Secondly, I'm envious of the amount of free time you have. Link to comment Share on other sites More sharing options...
Kurkistan Posted January 20, 2012 Report Share Posted January 20, 2012 (edited) EDIT: Turos and Kurkistan -- Firstly, you guys are awesome. Secondly, I'm envious of the amount of free time you have. Firstly: Thank you. Secondly: Ah, makes sense. Awesome! I'll check another article over tomorrow and watch in awe at how shiny it looks after conversion You may want to hold off for a moment. I'm doing some rather large revisions to boost efficiency, which are having odd side-effects. EDIT: Okay, done with that. Also added in some .pie rules. Essentially, I made a very foolish programming error that resulted in running about 20 times as many replace() functions as I needed to: This spiked run-times by a small amount. As evidence, I ran the odyssey for only 18 minutes, 16 seconds, despite having more grammars than the last time I ran it. EDIT 2: Made periodMover() a bit more efficient as well as allowing it to work on an arbitrary number of periods, added in a few rules for xious\n, irst\n, stion\n, the pp's. /** * Goal: Provide an easy means of transliterating Roman letters into Alethi script using Turos's font conventions. * * * @author Kurkistan, with significant developmental input from Turos * @date 01/20/2012 * @version 1.8.9.4 */ import java.io.FileReader; import java.io.FileWriter; import java.io.BufferedWriter; import java.io.InputStreamReader; import java.io.File; import java.io.PrintWriter; import java.io.IOException; import java.util.Scanner; import java.io.BufferedReader; import java.util.Arrays; public class AlethiTransliterator_1_8_9_4 { static boolean debug_char = false; static boolean debug_end_e = false; static boolean remove_illegal = true; static boolean add_CR = true; /* static String Targets = ""; static int min = 200; static int max = 400; */ static int Count = 0; static boolean Counting = true; public static void main (String[] arg) throws IOException{ Scanner input=new Scanner(System.in); System.out.print("Enter input file (full name of file in same directory): "); String temp = input.next(); //temp = "Test.txt"; final double startTime = System.currentTimeMillis(); final double endTime; try { String alethi = convertText(temp); if(alethi.equals("&")) return; //putting carriage-returns back in to make it look pretty in Notepad. I can't tell what else they might do. if(add_CR) for(int i = 0; i<alethi.length();i++) if(alethi.charAt(i)=='\n') alethi = alethi.substring(0,i)+"\r"+alethi.substring(i++,alethi.length()); //writeFile(Targets,"TEMP.txt"); temp = "Alethi_"+temp; writeFile(alethi,temp); if(debug_char){ String violations = allowedCharacters(alethi); //debugging blatant errors if(!violations.equals("")) System.out.println("Unauthorized sections in text (Line:Violation):"+"\n"+violations); } } finally { endTime = System.currentTimeMillis(); } final double duration = endTime - startTime; System.out.println("Execution time: "+(duration/1000)+" seconds"); } private static String convertText(String roman) throws IOException { roman = readFile(roman); //text file if((roman.length()==1)&&(roman.charAt(0)=='&')) //invalid input, halt program return "&"; if(remove_illegal) roman = removeCharacters(roman); roman = periodMover(roman); roman = spaceEnds(roman); String alethi = replaceLetters(roman); return unSpaceEnds(alethi); } /** * Load a text file contents as a <code>String<code>. * * @param file The input file * @return The file contents as a <code>String</code> * @exception IOException IO Error */ private static String readFile(String file) throws IOException { String whole = ""; try { BufferedReader in = new BufferedReader(new FileReader(file)); String str; while ((str = in.readLine()) != null) { whole = whole + str + '\n'; //process(str); } in.close(); } catch (IOException e) { System.out.println("File not in directory or misspelled."); return "&"; } whole="\n"+whole.toLowerCase(); //convert to lower - keeping an extra \n at the end and beginning for replacement ease of use, will get rid of it return whole; } private static void writeFile(String text, String destination) throws IOException { File file = new File(destination); boolean exist = file.createNewFile(); if (!exist) { System.out.println("Output file already exists."); System.exit(0); } else { FileWriter fstream = new FileWriter(destination); BufferedWriter out = new BufferedWriter(fstream); out.write(text); out.close(); System.out.println("File created successfully."); } } private static String allowedCharacters(String body) { //c, q, w, x, th, sh, ch - Forbidden; I assume no lowercaseases of the special characters (C, X) //\n, ' ', '.', C, S/s, T/t, X, - Allowed char[] library = new char[29]; String[] pairs = {"th","sh","ch"}; //These shouldn't trigger unless I made a serious mistake in the "necessary" section. String violations = ""; int line = 1; //for all of those +1ers out there int target_size = 2; int search = body.length() - target_size; for(int j = 0;j<pairs.length;j++) for(int i = 0; i<=search;i++) if(body.charAt(i)=='\n') line++; else if(body.substring(i,i+target_size).equals(pairs[j])) violations = violations + (line+":"+pairs[j]) + "; "; library[0] = '\n'; library[1] = ' '; library[2] = '.'; library[3] = 'C'; library[4] = 'S'; library[5] = 'T'; library[6] = 'X'; int place = 7; for(int i = 97; i <=122; i++){ if((i!=99)&&(i!=113)&&(i!=119)&&(i!=120)) //c, q, w, and x library[place++] = (char)i; } line = 1; //resetting for(int i = 0;i<body.length();i++) if(body.charAt(i)=='\n') line++; else if(Arrays.binarySearch(library,body.charAt(i))<0) //not in library violations = violations + (line+":"+body.charAt(i)) + "; "; return violations; } private static String removeCharacters(String body) { char[] library = new char[56]; library[0] = '\t'; //tab library[1] = '\n'; library[2] = ' '; library[3] = '.'; int place = 4; for(int i = 65; i <=90; i++) library[place++] = (char)i; for(int i = 97; i <=122; i++) library[place++] = (char)i; for(int i = 0; i < body.length(); i++) if(Arrays.binarySearch(library,body.charAt(i))<0) //I felt embarrassed by my earlier search algorithm. if((body.charAt(i)=='?')||(body.charAt(i)=='!')) body = body.substring(0,i)+"."+body.substring(i+1,body.length()); else body = body.substring(0,i)+body.substring(i--+1,body.length()); return body; } /** * In the Alethi alphabet, sentences start with a period '.' and don't end with anything. */ private static String periodMover(String body) { int start = 0; for(int i=0;i<body.length();i++) { if(body.charAt(i)=='.'){ while((i<body.length())&&(body.charAt(i)=='.')) //multiples body = body.substring(0,start)+"."+body.substring(start,i)+body.substring((i++)+1,body.length()); while(i<body.length()) if(!inAlphabet(body.charAt(i))) i++; else break; //Yes, the cardinal sin. start = i; } else if(body.charAt(i)=='\n') start=i+1; //Doesn't allow sentences to continue after true line breaks. Enables no-period headers and whatnot. } return body; } private static boolean inAlphabet(char character) { char[] library = new char[26]; int place = 0; for(int i = 97; i <=122; i++) library[place++] = (char)i; if(Arrays.binarySearch(library,character)>=0) //I felt embarrassed by my earlier search algorithm. return true; return false; } private static String spaceEnds(String body){ for(int i=0;i<body.length();i++) if(body.charAt(i)=='.') body = body.substring(0,i+1)+" "+body.substring((i++)+1,body.length()); else if(body.charAt(i)=='\n'){ body = body.substring(0,i)+" \n "+body.substring(i+1,body.length()); i+=2; } //System.out.println(body); return body; } private static String unSpaceEnds(String body){ for(int i=1;i<body.length()-2;i++) if(body.charAt(i)=='.') body = body.substring(0,i+1)+body.substring(i+2,body.length()); else if(body.charAt(i)=='\n') body = body.substring(0,i-1)+"\n"+body.substring((i--)+2,body.length()); if(body.charAt(body.length()-2)=='.') body = body.substring(0,body.length()-1); else if(body.charAt(body.length()-2)=='\n') body = body.substring(0,body.length()-3)+"\n"; return body.substring(1,body.length()-1); //clipping first/last '\n';; } public static void test() { String body = "\nbutler\n"; String target = "ap\n"; String sub = "op\n"; System.out.println(replace(body,target,sub)); int target_size = target.length(); int sub_size = sub.length(); String sofar = ""; int j = 2; if((target_size>=2)&&(target.charAt(target_size-2)=='p')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"pable\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } System.out.println(body); } /** * Special charaters: For t, use lower case t. For th, use capital T. For s, use lower case s. For sh, use capital S. For ch, use c. X will print a combination of k and s. For q and w, use your imagination. Technically speaking, q is a combination of k and u. W is basically a combination of a long u ("oo") and any other vowel: a e i o and short u ("uh") */ private static String replaceLetters(String body) { //Ease of use //1.3.5-Threw in an If statement in the replace function to deal with space and \n at the same time //ph body = replace(body,"ph","f"); //anti- body = replace(body,".anti",".antahy"); //wh body = replace(body,"who\n","hoo\n"); body = replace(body,"where","huair"); //changed w to u body = replace(body,"whir","huur"); body = replace(body,"wh","hu"); //Might need more permutations body = replace(body,".accr",".uhkr"); //many many many body = replace(body,".acci",".aksi"); body = replace(body,".accord",".uhkawrd"); body = replace(body,".accomp",".uhkuhmp"); body = replace(body,".acco",".uhko"); body = replace(body,".accustom\n",".uhkuhstuhm\n"); body = replace(body,".accolade\n",".akuhleyd\n"); body = replace(body,".accus",".uhkyooz"); body = replace(body,".accurs",".uhkurs"); body = replace(body,".accur",".akyer"); body = replace(body,".accum",".uhkyoom"); body = replace(body,".accout",".uhkoot"); body = replace(body,".accoun",".uhkount"); body = replace(body,".acce",".akse"); //the dreaded double c's body = replace(body,".ecc",".eks"); body = replace(body,"ucca","uhka"); body = replace(body,"ucco","uhko"); body = replace(body,"uccu","uhku"); body = replace(body,".occ",".uhk"); body = replace(body,"ucce","uhkse"); body = replace(body,"ucci","uhksi"); body = replace(body,"occup","okyuh"); //very special case body = replace(body,"occa","uhkah"); body = replace(body,"occi","oksi"); body = replace(body,"occe","ochee"); //? body = replace(body,"occo","okuh"); body = replace(body,"occu","okuh"); //Just went down the list on http://www.morewords.com/contains/cc - Useful, if laborious //E at end - Some interference possible with C's body = replace(body,"use\n","yooz\n"); body = replace(body,"used\n","yoozd\n"); //special case //Note: Need to make sure that plurals of e-enders are covered, i.e. wives. body = replace(body,"like\n","lahyk\n"); body = replace(body,"ole\n","ohl\n"); //hyperbole will suffer body = replace(body,"ose\n","ohz\n"); body = replace(body,"ame\n","eym\n"); body = replace(body,"ese\n","eez\n"); body = replace(body,"have\n","hav\n"); body = replace(body,"ave\n","eyv\n"); body = replace(body,"eive\n","eev\n"); body = replace(body,"vive\n","vahyv\n"); body = replace(body,"ive\n","iv\n"); //body = replace(body,"ever\n","ever\n"); body = replace(body,"eve\n","eev\n"); //HOWEVER body = replace(body,"eever\n","ever\n"); body = replace(body,"ile\n","ahyl\n"); //System.out.println(replace(replace("while ","wh","hu"),"ile\n","ahyl\n")); //huahyl body = replace(body,"gle\n","guhl\n"); body = replace(body,".key\n",".kee\n"); //special body = realReplace("QQQ",body,".keys\n",".kees\n"); body = replace(body,"base\n","beys\n"); //And now the ends-with function on scrabblefinder.com was useful body = replace(body,"case\n","keys\n"); body = replace(body,"chase\n","Ceys\n"); //ch == C body = replace(body,"Case\n","Ceys\n"); //necessary? body = replace(body,"erase\n","ihreys\n"); body = replace(body,"ase\n","eez\n"); body = replace(body,"olve\n","olv\n"); body = replace(body,"alve\n","ahv\n"); body = replace(body,"elve\n","elv\n"); body = replace(body,"some\n","suhm\n"); body = replace(body,"come\n","cuhm\n"); //Need to move this up body = replace(body,"ome\n","ohm\n"); body = replace(body,"ttle\n","tl\n"); body = replace(body,"tle\n","tl\n"); //This is what dictionary.com said to do, and I live to serve body = replace(body,".discipline\n",".disipline\n"); body = replace(body,"ine\n","ahyn\n"); body = replace(body,".one\n",".uuhn\n"); body = replace(body,"done\n","duhn\n"); body = replace(body,"none\n","nuhn\n"); body = replace(body,"one\n","ohn\n"); body = replace(body,"ake\n","eyk\n"); body = replace(body,"ope\n","ohp\n"); body = replace(body,"rue\n","roo\n"); body = replace(body,"ife\n","ahyf\n"); body = replace(body,"bead\n","beed\n"); body = replace(body,".read\n",".reed\n"); body = replace(body,"nead\n","need\n"); body = replace(body,"lead\n","leed\n"); body = replace(body,"ead\n","ed\n"); //general body = replace(body,"ade\n","eyd\n"); //ere - their vs there body = realReplace("QQQ",body,"ere\n","eir\n"); body = replace(body,".are\n",".ahr\n"); body = replace(body,"are\n","air\n"); body = replace(body,"oke\n","ohk\n"); body = replace(body,"tire","tahyuhr"); //NOT \n or e body = replace(body,"aire\n","air\n"); //body = replace(body,"ire\n","yuhr\n"); //? body = replace(body,"ype\n","ahyp\n"); body = replace(body,"urge\n","urj\n"); body = replace(body,"erge\n","urj\n"); //Not a mistake body = replace(body,"arge\n","hrj\n"); body = replace(body,"orge\n","wrj\n"); body = replace(body,"ime\n","ahym\n"); body = replace(body,"sle\n","ahyl\n"); body = replace(body,"promise\n","promis\n"); body = replace(body,"aise\n","eyz\n"); body = replace(body,"ise\n","ahyz\n"); body = replace(body,"lse\n","ls\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"sce\n","es\n"); body = replace(body,"que\n","k\n"); body = replace(body,"udge\n","uhj\n"); body = replace(body,"dge\n","j\n"); //NOT sure body = replace(body,"age\n","aij\n"); //gue - This one was irritating, might not be right body = replace(body,"logue\n","awg\n"); body = replace(body,"gogue\n","awg\n"); body = replace(body,".morgue\n",".mawrg\n"); body = replace(body,".fugue\n",".fyoog\n"); body = replace(body,".segue\n",".segwey\n"); body = replace(body,"rgue\n","rgyoo\n"); body = replace(body,"gue\n","eeg\n"); //-nge body = replace(body,"nge\n","nj\n"); //problem with sing vs singe not really being separable at the gerund-testing level body = replace(body,"sinjing\n","singing\n"); //comprehensive fix for gerund mishaps body = replace(body,"slinjing\n","slinging\n"); body = replace(body,"strinjing\n","stringing\n"); body = replace(body,"swinjing\n","swinging\n"); body = replace(body,"brinjing\n","bringing\n"); body = replace(body,"flinjing\n","flinging\n"); body = replace(body,"prinjing\n","pringing\n"); body = replace(body,".winjing\n",".winging\n"); body = replace(body,".zinjing\n",".zinging\n"); body = replace(body,".dinjing\n",".dinging\n"); body = replace(body,".pinjing\n",".pinging\n"); //END E's //s at end - 1.7.4.5 -> unneeded, I think //body = replace(body,"es\n","ez\n"); //Needs to go before c->s conversion, since C's are all soft S's //This is a big thing. I moved the c down mainly to allow for the s->z convertor to do it's job, and the judgement on whether or not this messes things up is pending. //START C 1.7 - moved so that higher number of characters in target get's preference, blocks kept cohesive //Stolen from the "necessary" bin. body = replace(body,"ch","C"); //Although both versions of C work, I'm assuming capitalized, so no lowercas c's are allowed in the text body = replace(body,"accent","aksent"); body = replace(body,"exercise\n","eksersahyz\n"); body = replace(body,".once",".wuhns"); body = replace(body,"preface\n","prefis\n"); //special body = replace(body,"icise\n","uhsahyz\n"); body = replace(body,"rcise\n","ruhsahyz\n"); body = replace(body,".tacit\n",".tasit\n"); body = replace(body,"ciate\n","sheeeyt\n"); body = replace(body,"vate\n","vit\n"); //pulled from E section, might be a sign of things to come body = replace(body,"literate\n","literit\n"); body = replace(body,"ate\n","eyt\n"); body = replace(body,"cision\n","sizhuhn\n"); body = replace(body,"cise\n","sahys\n"); body = replace(body,"cist\n","sist"); body = replace(body,"uce\n","us\n"); body = replace(body,"uces\n","usez\n"); //z incorporated body = replace(body,"uced\n","usst\n"); //D's body = replace(body,"came\n","keym\n"); body = replace(body,"came","kamuh"); body = replace(body,"ct","kt"); //factual body = replace(body,"tual\n","Cual\n"); body = replace(body,".acid\n",".asid\n"); body = replace(body,".aci",".uhsi"); body = replace(body,"ierce\n","eers\n"); body = replace(body,"ince\n","ins\n"); //body = replace(body,".ance",".ahns"); body = replace(body,".trance",".trahns"); body = replace(body,"dance\n","dahns\n"); body = replace(body,"Cance","Cahns"); body = replace(body,"cance","cahns"); body = replace(body,"lance","lahns"); body = replace(body,"vance","vahns"); body = replace(body,"ance\n","uhns\n"); body = replace(body,"all\n","awl\n"); body = replace(body,".supp",".suhpp"); //just a general rule body = replace(body,"appa","apuh"); body = replace(body,"ppen","pen"); //double p's, might NOT be done body = replace(body,"pplet\n","plit\n"); body = replace(body,"pple\n","puhl\n"); body = realReplace("QQQ",body,".supplement\n",".suhpluhment\n"); //special case body = replace(body,"ppl","puhl"); body = replace(body,"upp\n","uhp"); body = replace(body,"oppor","oper"); body = replace(body,"opp","uhp"); body = replace(body,"ypp","ip"); body = replace(body,"pp","p"); //Last ditch, should cover most before this body = replace(body,"tice\n","tis\n"); body = replace(body,"arice\n","eris\n"); body = replace(body,"orice\n","uhis\n"); body = replace(body,"cipice\n","suhpis\n"); //patch for precipice body = replace(body,"ipice\n","uhpis\n"); body = replace(body,".vice\n","vahys\n"); body = replace(body,"vice\n","vis\n"); body = replace(body,"ice\n","ahys\n"); //Long S. NOT sure about \n's body = replace(body,"egy\n","ijee\n"); //possibilities/strategies fix, I have now idea how the ended up "kiez" body = replace(body,"ity\n","itee\n"); body = replace(body,"ite\n","ahyt\n"); body = replace(body,"irst\n","urst\n"); body = replace(body,"ong\n","ong\n"); body = replace(body,"ull\n","ool\n"); body = replace(body,"cide\n","sahyd\n"); body = replace(body,"ide\n","ahyd\n"); body = replace(body,"ence\n","ens\n"); body = replace(body,"rend\n","rend\n"); //1.8.9 Pie- body = replace(body,"piety","pahyitee"); body = replace(body,".pier\n"," peer\n"); body = replace(body,".pie\n"," pahy\n"); body = replace(body,".pie",".pee"); body = replace(body,"ces\n","seez\n"); body = replace(body,"cez\n","seez\n"); //Incase of S->Z body = replace(body,"ce\n","s\n"); body = replace(body,"ci\n","sahy\n"); body = replace(body,"oy\n","oi\n"); body = replace(body,"ace\n","eys\n"); body = replace(body,".chull\n",".as\n"); body = replace(body,".chull",".uhs"); //Assoc- body = replace(body,"ely\n","lee\n"); //MUST BE LAST IN \N body = replace(body,".scie",".sahye"); //For Science! body = replace(body,"sciou","shuh"); //For Conscience! body = replace(body,"cious","shuhs"); //For Ithaca! body = replace(body,"scio","shuh"); body = replace(body,"scie","shuh"); body = replace(body,"ply\n","plahy\n"); body = replace(body,".by\n",".bahy\n"); body = replace(body,".my\n",".mahy\n"); body = replace(body,".die\n",".dahy\n"); body = replace(body,".dye\n",".dahy\n"); body = replace(body,".bye\n",".bahy\n"); //conflict body = replace(body,"hype","hahype"); body = replace(body,"hypo","hahypo"); body = replace(body,"hypn","hipn"); body = replace(body,"hyphen","hahyfuhn"); body = replace(body,"hyfen","hahyfuhn"); //ph->f body = replace(body,"yp","ip"); body = replace(body,"duct","duhkt"); body = replace(body,"stion","sCuhn"); //1.8.9.4 body = replace(body,"tion","Suhn"); //1.8 body = replace(body,"ssion","Suhn"); //1.8.6 body = replace(body,"sion","zhuhn"); body = replace(body,"cean","Suhn"); body = replace(body,"ture","Cur"); body = replace(body,"cies","seez"); //prophocies body = replace(body,"ciez","seez"); //s->z already done body = replace(body,"iew","yoo"); body = replace(body,".face",".feys"); body = replace(body,"face","feys"); //For- body = replace(body,".fore",".fohr"); body = replace(body,".for",".fohr"); //ore, as in fore, bore body = replace(body,"ore","ohr"); body = replace(body,"acen","eysuhn"); //Don't get complacent body = replace(body,"ician","ishuhn"); //musician body = replace(body,"cism","sizuhm"); //anglicanism body = replace(body,"cial","shul"); body = replace(body,".acq",".akw"); //might need refinement body = replace(body,"cque","ke"); body = replace(body,"acquaint","uhkweyeynt"); body = replace(body,"cing","sing"); //1.6.5 - odyssey test body = replace(body,"exce","ikse"); body = replace(body,"excit","iksahyt"); body = replace(body,"excis","eksahyz"); body = replace(body,"ici","isi"); //Sicily body = replace(body,"iec","ees"); //Piece/Peace -> Pees body = replace(body,"eac","ees"); body = replace(body,"ight","ahyt"); body = replace(body,"cep","sep"); body = replace(body,"cin","sin"); body = replace(body,".cit",".sit"); body = replace(body,"cip","sip"); body = replace(body,"cif","sif"); //NOT sure body = replace(body,"icc","ik"); body = replace(body,"icn","ikn"); body = replace(body,"sce","se"); body = replace(body,"sci","si"); body = replace(body,"scy","sahy"); //body = replace(body,"sco","sko"); body = replace(body,"cea","sea"); body = replace(body,"nci","nsi"); //might need refinement body = replace(body,"ncy","nsee"); body = replace(body,"cei","see"); body = replace(body,"cee","see"); body = replace(body,"cent","sent"); //odyssey body = replace(body,"it\n","it\n"); //Tacked on for suffix reasons body = replace(body,"ap\n","ap\n"); //starting with c body = replace(body,".cy",".sahy"); body = replace(body,".cir",".sur"); body = replace(body,".cid",".sahyd"); body = replace(body,".ci",".si"); body = replace(body,".cer",".sur"); body = replace(body,".ce",".se"); body = replace(body,"ck","k"); /* body = realReplace("QQQ",body,"C\n","k\n"); body = realReplace("QQQ",body,"ch\n","k\n"); */ body = replace(body,"sc","sk"); body = replace(body,"cy","see"); //1.4.3 - si->see body = replace(body,"ce","se"); body = replace(body,"ca","ka"); body = replace(body,"co","ko"); body = replace(body,"cu","ku"); body = replace(body,"ct","kt"); body = replace(body,"cl","kl"); body = replace(body,"cr","kr"); body = realReplace("QQQ",body,".c",".k"); //This can possibly leave lowercase c's in the text, although I think that all properly spelled words should be covered here. body = realReplace("QQQ",body,"c\n","k\n"); //to stop mischeif //END C'S //Not sure where to put this section //ss body = replace(body,"ss","s"); body = replace(body,".be\n",".bee\n"); body = replace(body,".maybe\n",".meybee\n"); //gh body = replace(body,"gha","gah"); //This section needs work body = replace(body,"gho","goh"); body = replace(body,"ought","awt"); body = replace(body,"though","thoh"); body = replace(body,"bough","bou"); body = replace(body,"cough","kof"); body = replace(body,"igh","ahy"); body = replace(body,"gh\n","\n"); body = replace(body,"gh","g"); //to, too, two - Just a quick patch for those three words, not a general solution to any problem I can see body = replace(body,".to\n",".too\n"); body = replace(body,".two\n",".too\n"); //q at end body = realReplace("QQQ",body,"q\n","k\n"); //w at end body = replace(body,".low\n",".loh\n");//special cases body = replace(body,".row\n",".roh\n"); body = replace(body,"ow\n","au\n"); //.sy body = replace(body,".syr",".suhr"); //Moved up to e-enders body = replace(body,".syr",".sir"); body = replace(body,".sly",".slahy"); body = replace(body,".lying\n",".lahying\n"); body = replace(body,".ly",".li"); //sz->siz - The coward's way out. I need to sit down and make this thing more cohesive body = replace(body,"sz\n","siz\n"); body = replace(body,"pie\n","pahy\n"); // NOT normal, aka special body = realReplace("qqq",body,".or\n",".awr\n"); body = realReplace("qqq",body,"y\n","ee\n"); body = realReplace("qqq",body,"ehee\n","ehy\n"); body = realReplace("qqq",body,"ahee\n","ahy\n"); body = realReplace("qqq",body,"eee\n","ey\n"); //fixing issues raised by y->ee as compared to other phonetics String[] temp = {"en","st","un","c","f","g","s","t",""}; body = replace(body,"ctable\n","kteybuhl\n"); //save the c's! for(int i = 0; i<temp.length;i++) if(temp.equals("c")) body = replace(body,"kable\n","eybuhl\n"); else body = replace(body,temp+"able\n","eybuhl\n"); body = replace(body,"able\n","uhbuhl\n"); //This one is either "eybuhl" for a few short words or "uhbuhl" for all others body = replace(body,"ble\n","buhl\n"); //x's body = replace(body,".xy",".zi"); body = replace(body,"xious","kSuhs"); //General fixer for suffixes //body = replace(body,"\n","\n"); //The annoying part is the hodge-podgeness of English. The only workable rout may be just to demand phonetic spelling in cases like "Tow" //Necessary --Moved down to make ease-of-use conversions easier body = replace(body,"th","T"); body = replace(body,"sh","S"); //body = replace(body,"ch","C"); //took some liberties here, capitalized the C to make room for the c->k/s conversion body = replace(body,"x","X"); //Consistency - x is really a compound character of ks. body = replace(body,"qu","ku"); //body = replace(body,"q","ku"); /* body = replace(body,"wa","ua"); //Unnecessary? I think not! I'm not sure why, but no. body = replace(body,"we","ue"); body = replace(body,"wi","ui"); body = replace(body,"wo","uo"); body = replace(body,"wu","uu"); */ body = replace(body,"w","u"); //exception catcher if(debug_end_e){ body = replace(body,"e\n","Q\n"); //Just for debugging body = replace(body,".TQ",".Te"); body = replace(body,".bQ",".be"); body = replace(body,".seQ",".seee"); body = replace(body,".mQ",".me"); body = replace(body,"eQ\n","ee\n"); body = replace(body,"Qy\n","ey\n"); body = replace(body,".hQ",".he"); body = replace(body,".shQ",".she"); } return body; } private static String replace(String body, String target, String sub){ return realReplace("",body,target,sub); } private static String realReplace(String sofar, String body, String target, String sub) { int target_size = target.length(); int sub_size = sub.length(); /* if((min<Count++)&&(max>Count)) Targets+= target+"_"; */ if(Counting) { Count++; if(target.equals("w")) System.out.println("Replaces Run: "+Count); } //As of 1.8.8.1, '.' and '\n' are only codes for ' '. Spaces will be added before and after every \n, as well as after every period, then removed at the end. //'.'==' ' if(target.startsWith(".")) return realReplace(sofar, body,(" "+target.substring(1,target_size)),(" "+sub.substring(1,sub_size))); else if(target.endsWith("\n")) return realReplace(sofar, body,(target.substring(0,target_size-1)+" "),(sub.substring(0,sub_size-1)+" ")); //space substitution if(target.endsWith(" ")) if(sofar.length()<=2){ //that took longer than it should have. Anyone who can suggest improvements is welcome to try. if(target.equals("y ")) System.out.println(target); if((!sofar.contains("z"))&&(!sofar.contains("l"))){ //I think contains() covers it. It saves time over endsWith() if it stops unnecessary calls to realReplace(), as long as it doesn't cut out possible permutations if(!sofar.contains("i"))// s->z if((target_size>=2)&&(target.charAt(target_size-2)!='s')&&(target.charAt(target_size-2)!='z')) //Double-checking s/z if(target.charAt(target_size-2)=='e') if((sub_size>=2)&&(sub.charAt(sub_size-2)=='e')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s "),(sub.substring(0,sub_size-1)+"z ")); else if((sub_size>=2)&&(sub.charAt(sub_size-2)=='y')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s "),(sub.substring(0,sub_size-1)+"z ")); //s->z else body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s "),(sub.substring(0,sub_size-1)+"ez ")); //s->z else if(((target_size>=2)&&(target.charAt(target_size-2)=='y'))||(target_size<3)) //bug stopper if((sub_size>=2)&&(sub.charAt(sub_size-2)=='e')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-2)+"ies "),(sub.substring(0,sub_size-1)+"z ")); else body = realReplace(sofar+"z",body,(target.substring(0,target_size-2)+"ies "),(sub.substring(0,sub_size-1)+"iez ")); //s->z else body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s "),(sub.substring(0,sub_size-1)+"z ")); //s->z /* //y body = realReplace("qqq",body,"ay ","ey "); //stopgap, might want to revisit body = replace(body,"ey ","ey "); body = realReplace("qqq",body,"oy ","oi "); body = realReplace("qqq",body,"uy ","ahy "); body = realReplace("qqq",body,"y ","ee "); //might need generalized in replace() body = replace(body,"ty","tahy"); */ //ly, focus on y as of 1.7.4.3 - It might need some work if(target.equals("sly ")) //special case body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); else{ //ly if((target_size>=5)&&(target.substring(target_size-5,target_size-1).equals("able"))) body = realReplace(sofar+"l",body,(target.substring(0,target_size-2)+"y "),(sub.substring(0,sub_size-4)+"lee ")); //ably else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='y')) if((sub_size>=3)&&(sub.substring(sub_size-3,sub_size-1).equals("ee"))) body = realReplace(sofar+"l",body,(target.substring(0,target_size-3)+"ily "),(sub.substring(0,sub_size-3)+"uhlee ")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-2)+"ily "),(sub.substring(0,sub_size-2)+"uhlee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='o')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"pily "),(sub.substring(0,sub_size-1)+"uhlee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"tily "),(sub.substring(0,sub_size-1)+"uhlee ")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); //y if((target_size>=2)&&(target.charAt(target_size-2)=='a')) //might need work body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"y "),(sub.substring(0,sub_size-2)+"ey ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"y "),(sub.substring(0,sub_size-1)+"y ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='o')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"y "),(sub.substring(0,sub_size-1)+"i ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='u')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"y "),(sub.substring(0,sub_size-2)+"ahy ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"py "),(sub.substring(0,sub_size-1)+"ee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"ty "),(sub.substring(0,sub_size-1)+"ee ")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); //might not be needed } if((!sofar.contains("g"))&&(!sofar.contains("i"))&&(!sofar.contains("r"))){ //covers multiple if((!target.endsWith("g "))&&(!target.endsWith("gs "))&&(!target.endsWith("gz "))) //leave no base uncovered if((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("ie"))) body = realReplace(sofar+"g",body,(target.substring(0,target_size-3)+"ying "),(sub.substring(0,sub_size-1)+"ing ")); //replacing 'ie' before gerund else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-2)+"ing "),(sub.substring(0,sub_size-1)+"ing ")); //removing 'e' else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ping "),(sub.substring(0,sub_size-1)+"ing ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ting "),(sub.substring(0,sub_size-1)+"ing ")); else if((!target.endsWith("gs "))&&(!target.endsWith("gz "))) //no "ing\n" or s\z at end body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ing "),(sub.substring(0,sub_size-1)+"ing ")); //no e, presumably ends in consonant if((!sofar.contains("a"))&&(!sofar.contains("d"))) //ish if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"pish "),(sub.substring(0,sub_size-1)+"ish ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"tish "),(sub.substring(0,sub_size-1)+"ish ")); else if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ed")))||(target_size<3)) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"ish "),(sub.substring(0,sub_size-1)+"ish ")); if(!sofar.contains("a")) //able if((target_size>=2)&&(target.charAt(target_size-2)=='p')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"pable "),(sub.substring(0,sub_size-1)+"uhbuhl ")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"uhbuhl ")); } else if((target_size>=2)&&(target.charAt(target_size-2)=='t')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"table "),(sub.substring(0,sub_size-1)+"uhbuhl ")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"uhbuhl ")); } else if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"uhbuhl ")); else if(target.equals("fly")||target.equals("unfly")) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"uhbuhl ")); else if(((target_size>=4)&&(target.substring(target_size-4,target_size-1).equals("ing")))||(target_size<4)) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"eybuhl ")); } if((!sofar.contains("g"))&&(!sofar.contains("d"))){ //covers multiple if(target_size>=2) //d at end if(target.charAt(target_size-2)=='e') if((target_size>=3)&&(target.charAt(target_size-3)=='c')) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d "),(sub.substring(0,sub_size-1)+"st ")); else body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d "),(sub.substring(0,sub_size-1)+"ed ")); //NOT st else if(target.charAt(target_size-2)=='s') body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ed "),(sub.substring(0,sub_size-1)+"ed ")); else if((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("se"))) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d "),(sub.substring(0,sub_size-1)+"ed ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ped "),(sub.substring(0,sub_size-1)+"ed ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ted "),(sub.substring(0,sub_size-1)+"ed ")); else if((target.charAt(target_size-2)!='s')||((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("ss")))) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ed "),(sub.substring(0,sub_size-1)+"ed ")); //er if(!sofar.contains("r")) if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"r "),(sub.substring(0,sub_size-1)+"er ")); //removing 'e' else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"per "),(sub.substring(0,sub_size-1)+"er ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"ter "),(sub.substring(0,sub_size-1)+"er ")); else body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"er "),(sub.substring(0,sub_size-1)+"er ")); } /* //ate, not bothering with fobiddances - Never mind if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"r\n"),(sub.substring(0,sub_size-1)+"er\n")); //removing 'e' else body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"er\n"),(sub.substring(0,sub_size-1)+"er\n")); */ //Why do these need to be dealt with here? //Because these permuations need to be available to figure out which \n grammars to apply //ed, ish, ly, ing, able, edly, ishly, ably, lying, eding, abling //Dirty method - add a recursion counter to replace() //6 max - ed ish ly ing able z //ablingly, lyingly - 3 //ablinger //s-z, ly-l, ing-g, d-d, ish-i, able-a //everything abides i, nothing abides s/l //nevermind, not much likes i either //a allows l/s/d, //a forbids a, i //d forbids d, i //g forbids d, g, i, a //i forbids s, g, i, a //er-r //r forbids g, i, a, r //r is forbidden by s, l, g, d //y-y //Not messing with forbidding now (1.8.8.2) //I think that forbiddance is total - no forbidden suffixes at any point before } } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } return body; } } Edited January 20, 2012 by Kurkistan 1 Link to comment Share on other sites More sharing options...
Turos Posted January 21, 2012 Author Report Share Posted January 21, 2012 (edited) Turos and Kurkistan -- Firstly, you guys are awesome. Secondly, I'm envious of the amount of free time you have. 1st: 2nd: I apologize for not having done another run-through yet. I have been distracted with my new Kindle Touch I will schedule a test tomorrow morning, as I have to go to sleep now and I work graveyard Note the graveyard shift = no social life = extra free time! ... ( ) Edited January 21, 2012 by Turos Link to comment Share on other sites More sharing options...
dhalagirl Posted January 22, 2012 Report Share Posted January 22, 2012 Note the graveyard shift = no social life = extra free time! ... ( ) I have the opposite problem. Three jobs + limited social life = no free time and very little sleep. Link to comment Share on other sites More sharing options...
Kurkistan Posted January 22, 2012 Report Share Posted January 22, 2012 (edited) 1st: 2nd: I apologize for not having done another run-through yet. I have been distracted with my new Kindle Touch I will schedule a test tomorrow morning, as I have to go to sleep now and I work graveyard Note the graveyard shift = no social life = extra free time! ... ( ) No problem, you've had a remarkable turnaround speed so far. This also gives me extra time to polish the next version before you test it. Speaking of that. . . Generalized .or\n to just .or EDIT: Sped up inAlphabet() and fixed file name within the code so that it can actually run. /** * Goal: Provide an easy means of transliterating Roman letters into Alethi script using Turos's font conventions. * * * @author Kurkistan, with significant developmental input from Turos * @date 01/24/2012 * @version 1.8.9.6 */ import java.io.FileReader; import java.io.FileWriter; import java.io.BufferedWriter; import java.io.InputStreamReader; import java.io.File; import java.io.PrintWriter; import java.io.IOException; import java.util.Scanner; import java.io.BufferedReader; import java.util.Arrays; public class AlethiTransliterator_1_8_9_6 { static boolean debug_char = false; static boolean debug_end_e = false; static boolean remove_illegal = true; static boolean add_CR = true; /* static String Targets = ""; static int min = 200; static int max = 400; */ static int Count = 0; static boolean Counting = true; public static void main (String[] arg) throws IOException{ Scanner input=new Scanner(System.in); System.out.print("Enter input file (full name of file in same directory): "); String temp = input.next(); //temp = "Test.txt"; final double startTime = System.currentTimeMillis(); final double endTime; try { String alethi = convertText(temp); if(alethi.equals("&")) return; //putting carriage-returns back in to make it look pretty in Notepad. I can't tell what else they might do. if(add_CR) for(int i = 0; i<alethi.length();i++) if(alethi.charAt(i)=='\n') alethi = alethi.substring(0,i)+"\r"+alethi.substring(i++,alethi.length()); //writeFile(Targets,"TEMP.txt"); temp = "Alethi_"+temp; writeFile(alethi,temp); if(debug_char){ String violations = allowedCharacters(alethi); //debugging blatant errors if(!violations.equals("")) System.out.println("Unauthorized sections in text (Line:Violation):"+"\n"+violations); } } finally { endTime = System.currentTimeMillis(); } final double duration = endTime - startTime; System.out.println("Execution time: "+(duration/1000)+" seconds"); } private static String convertText(String roman) throws IOException { roman = readFile(roman); //text file if((roman.length()==1)&&(roman.charAt(0)=='&')) //invalid input, halt program return "&"; if(remove_illegal) roman = removeCharacters(roman); roman = periodMover(roman); roman = spaceEnds(roman); String alethi = replaceLetters(roman); return unSpaceEnds(alethi); } /** * Load a text file contents as a <code>String<code>. * * @param file The input file * @return The file contents as a <code>String</code> * @exception IOException IO Error */ private static String readFile(String file) throws IOException { String whole = ""; try { BufferedReader in = new BufferedReader(new FileReader(file)); String str; while ((str = in.readLine()) != null) { whole = whole + str + '\n'; //process(str); } in.close(); } catch (IOException e) { System.out.println("File not in directory or misspelled."); return "&"; } whole="\n"+whole.toLowerCase(); //convert to lower - keeping an extra \n at the end and beginning for replacement ease of use, will get rid of it return whole; } private static void writeFile(String text, String destination) throws IOException { File file = new File(destination); boolean exist = file.createNewFile(); if (!exist) { System.out.println("Output file already exists."); System.exit(0); } else { FileWriter fstream = new FileWriter(destination); BufferedWriter out = new BufferedWriter(fstream); out.write(text); out.close(); System.out.println("File created successfully."); } } private static String allowedCharacters(String body) { //c, q, w, x, th, sh, ch - Forbidden; I assume no lowercaseases of the special characters (C, X) //\n, ' ', '.', C, S/s, T/t, X, - Allowed char[] library = new char[29]; String[] pairs = {"th","sh","ch"}; //These shouldn't trigger unless I made a serious mistake in the "necessary" section. String violations = ""; int line = 1; //for all of those +1ers out there int target_size = 2; int search = body.length() - target_size; for(int j = 0;j<pairs.length;j++) for(int i = 0; i<=search;i++) if(body.charAt(i)=='\n') line++; else if(body.substring(i,i+target_size).equals(pairs[j])) violations = violations + (line+":"+pairs[j]) + "; "; library[0] = '\n'; library[1] = ' '; library[2] = '.'; library[3] = 'C'; library[4] = 'S'; library[5] = 'T'; library[6] = 'X'; int place = 7; for(int i = 97; i <=122; i++){ if((i!=99)&&(i!=113)&&(i!=119)&&(i!=120)) //c, q, w, and x library[place++] = (char)i; } line = 1; //resetting for(int i = 0;i<body.length();i++) if(body.charAt(i)=='\n') line++; else if(Arrays.binarySearch(library,body.charAt(i))<0) //not in library violations = violations + (line+":"+body.charAt(i)) + "; "; return violations; } private static String removeCharacters(String body) { char[] library = new char[56]; library[0] = '\t'; //tab library[1] = '\n'; library[2] = ' '; library[3] = '.'; int place = 4; for(int i = 65; i <=90; i++) library[place++] = (char)i; for(int i = 97; i <=122; i++) library[place++] = (char)i; for(int i = 0; i < body.length(); i++) if(Arrays.binarySearch(library,body.charAt(i))<0) //I felt embarrassed by my earlier search algorithm. if((body.charAt(i)=='?')||(body.charAt(i)=='!')) body = body.substring(0,i)+"."+body.substring(i+1,body.length()); else body = body.substring(0,i)+body.substring(i--+1,body.length()); return body; } /** * In the Alethi alphabet, sentences start with a period '.' and don't end with anything. */ private static String periodMover(String body) { int start = 0; for(int i=0;i<body.length();i++) { if(body.charAt(i)=='.'){ while((i<body.length())&&(body.charAt(i)=='.')) //multiples body = body.substring(0,start)+"."+body.substring(start,i)+body.substring((i++)+1,body.length()); while(i<body.length()) if(!inAlphabet(body.charAt(i))) i++; else break; //Yes, the cardinal sin. start = i; } else if(body.charAt(i)=='\n') start=i+1; //Doesn't allow sentences to continue after true line breaks. Enables no-period headers and whatnot. } return body; } private static boolean inAlphabet(char character) { int value = (int)character; if((value>=97)&&(value<=122)) //just checking lowercase letters return true; return false; } private static String spaceEnds(String body){ for(int i=0;i<body.length();i++) if(body.charAt(i)=='.') body = body.substring(0,i+1)+" "+body.substring((i++)+1,body.length()); else if(body.charAt(i)=='\n'){ body = body.substring(0,i)+" \n "+body.substring(i+1,body.length()); i+=2; } //System.out.println(body); return body; } private static String unSpaceEnds(String body){ for(int i=1;i<body.length()-2;i++) if(body.charAt(i)=='.') body = body.substring(0,i+1)+body.substring(i+2,body.length()); else if(body.charAt(i)=='\n') body = body.substring(0,i-1)+"\n"+body.substring((i--)+2,body.length()); if(body.charAt(body.length()-2)=='.') body = body.substring(0,body.length()-1); else if(body.charAt(body.length()-2)=='\n') body = body.substring(0,body.length()-3)+"\n"; return body.substring(1,body.length()-1); //clipping first/last '\n';; } public static void test() { String body = "\nbutler\n"; String target = "ap\n"; String sub = "op\n"; System.out.println(replace(body,target,sub)); int target_size = target.length(); int sub_size = sub.length(); String sofar = ""; int j = 2; if((target_size>=2)&&(target.charAt(target_size-2)=='p')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"pable\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able\n"),(sub.substring(0,sub_size-1)+"uhbuhl\n")); } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } System.out.println(body); } /** * Special charaters: For t, use lower case t. For th, use capital T. For s, use lower case s. For sh, use capital S. For ch, use c. X will print a combination of k and s. For q and w, use your imagination. Technically speaking, q is a combination of k and u. W is basically a combination of a long u ("oo") and any other vowel: a e i o and short u ("uh") */ private static String replaceLetters(String body) { //Ease of use //1.3.5-Threw in an If statement in the replace function to deal with space and \n at the same time //ph body = replace(body,"ph","f"); //anti- body = replace(body,".anti",".antahy"); //wh body = replace(body,"who\n","hoo\n"); body = replace(body,"where","huair"); //changed w to u body = replace(body,"whir","huur"); body = replace(body,"wh","hu"); //Might need more permutations body = replace(body,".accr",".uhkr"); //many many many body = replace(body,".acci",".aksi"); body = replace(body,".accord",".uhkawrd"); body = replace(body,".accomp",".uhkuhmp"); body = replace(body,".acco",".uhko"); body = replace(body,".accustom\n",".uhkuhstuhm\n"); body = replace(body,".accolade\n",".akuhleyd\n"); body = replace(body,".accus",".uhkyooz"); body = replace(body,".accurs",".uhkurs"); body = replace(body,".accur",".akyer"); body = replace(body,".accum",".uhkyoom"); body = replace(body,".accout",".uhkoot"); body = replace(body,".accoun",".uhkount"); body = replace(body,".acce",".akse"); //the dreaded double c's body = replace(body,".ecc",".eks"); body = replace(body,"ucca","uhka"); body = replace(body,"ucco","uhko"); body = replace(body,"uccu","uhku"); body = replace(body,".occ",".uhk"); body = replace(body,"ucce","uhkse"); body = replace(body,"ucci","uhksi"); body = replace(body,"occup","okyuh"); //very special case body = replace(body,"occa","uhkah"); body = replace(body,"occi","oksi"); body = replace(body,"occe","ochee"); //? body = replace(body,"occo","okuh"); body = replace(body,"occu","okuh"); //Just went down the list on http://www.morewords.com/contains/cc - Useful, if laborious //E at end - Some interference possible with C's body = replace(body,"use\n","yooz\n"); body = replace(body,"used\n","yoozd\n"); //special case //Note: Need to make sure that plurals of e-enders are covered, i.e. wives. body = replace(body,"like\n","lahyk\n"); body = replace(body,"ole\n","ohl\n"); //hyperbole will suffer body = replace(body,"ose\n","ohz\n"); body = replace(body,"ame\n","eym\n"); body = replace(body,"ese\n","eez\n"); body = replace(body,"have\n","hav\n"); body = replace(body,"ave\n","eyv\n"); body = replace(body,"eive\n","eev\n"); body = replace(body,"vive\n","vahyv\n"); body = replace(body,"ive\n","iv\n"); //body = replace(body,"ever\n","ever\n"); body = replace(body,"eve\n","eev\n"); //HOWEVER body = replace(body,"eever\n","ever\n"); body = replace(body,"ile\n","ahyl\n"); //System.out.println(replace(replace("while ","wh","hu"),"ile\n","ahyl\n")); //huahyl body = replace(body,"gle\n","guhl\n"); body = replace(body,".key\n",".kee\n"); //special body = realReplace("QQQ",body,".keys\n",".kees\n"); body = replace(body,"base\n","beys\n"); //And now the ends-with function on scrabblefinder.com was useful body = replace(body,"case\n","keys\n"); body = replace(body,"chase\n","Ceys\n"); //ch == C body = replace(body,"Case\n","Ceys\n"); //necessary? body = replace(body,"erase\n","ihreys\n"); body = replace(body,"ase\n","eez\n"); body = replace(body,"olve\n","olv\n"); body = replace(body,"alve\n","ahv\n"); body = replace(body,"elve\n","elv\n"); body = replace(body,"some\n","suhm\n"); body = replace(body,"come\n","cuhm\n"); //Need to move this up body = replace(body,"ome\n","ohm\n"); body = replace(body,"ttle\n","tl\n"); body = replace(body,"tle\n","tl\n"); //This is what dictionary.com said to do, and I live to serve body = replace(body,".discipline\n",".disipline\n"); body = replace(body,"ine\n","ahyn\n"); body = replace(body,".one\n",".uuhn\n"); body = replace(body,"done\n","duhn\n"); body = replace(body,"none\n","nuhn\n"); body = replace(body,"one\n","ohn\n"); body = replace(body,"ake\n","eyk\n"); body = replace(body,"ope\n","ohp\n"); body = replace(body,"rue\n","roo\n"); body = replace(body,"ife\n","ahyf\n"); body = replace(body,"bead\n","beed\n"); body = replace(body,".read\n",".reed\n"); body = replace(body,"nead\n","need\n"); body = replace(body,"lead\n","leed\n"); body = replace(body,"ead\n","ed\n"); //general body = replace(body,"ade\n","eyd\n"); //ere - their vs there body = realReplace("QQQ",body,"ere\n","eir\n"); body = replace(body,".are\n",".ahr\n"); body = replace(body,"are\n","air\n"); body = replace(body,"oke\n","ohk\n"); body = replace(body,"tire","tahyuhr"); //NOT \n or e body = replace(body,"aire\n","air\n"); //body = replace(body,"ire\n","yuhr\n"); //? body = replace(body,"ype\n","ahyp\n"); body = replace(body,"urge\n","urj\n"); body = replace(body,"erge\n","urj\n"); //Not a mistake body = replace(body,"arge\n","hrj\n"); body = replace(body,"orge\n","wrj\n"); body = replace(body,"ime\n","ahym\n"); body = replace(body,"sle\n","ahyl\n"); body = replace(body,"promise\n","promis\n"); body = replace(body,"aise\n","eyz\n"); body = replace(body,"ise\n","ahyz\n"); body = replace(body,"lse\n","ls\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"igue\n","teeg\n"); body = replace(body,"sce\n","es\n"); body = replace(body,"que\n","k\n"); body = replace(body,"udge\n","uhj\n"); body = replace(body,"dge\n","j\n"); //NOT sure body = replace(body,"age\n","aij\n"); //gue - This one was irritating, might not be right body = replace(body,"logue\n","awg\n"); body = replace(body,"gogue\n","awg\n"); body = replace(body,".morgue\n",".mawrg\n"); body = replace(body,".fugue\n",".fyoog\n"); body = replace(body,".segue\n",".segwey\n"); body = replace(body,"rgue\n","rgyoo\n"); body = replace(body,"gue\n","eeg\n"); //-nge body = replace(body,"nge\n","nj\n"); //problem with sing vs singe not really being separable at the gerund-testing level body = replace(body,"sinjing\n","singing\n"); //comprehensive fix for gerund mishaps body = replace(body,"slinjing\n","slinging\n"); body = replace(body,"strinjing\n","stringing\n"); body = replace(body,"swinjing\n","swinging\n"); body = replace(body,"brinjing\n","bringing\n"); body = replace(body,"flinjing\n","flinging\n"); body = replace(body,"prinjing\n","pringing\n"); body = replace(body,".winjing\n",".winging\n"); body = replace(body,".zinjing\n",".zinging\n"); body = replace(body,".dinjing\n",".dinging\n"); body = replace(body,".pinjing\n",".pinging\n"); //END E's //s at end - 1.7.4.5 -> unneeded, I think //body = replace(body,"es\n","ez\n"); //Needs to go before c->s conversion, since C's are all soft S's //This is a big thing. I moved the c down mainly to allow for the s->z convertor to do it's job, and the judgement on whether or not this messes things up is pending. //START C 1.7 - moved so that higher number of characters in target get's preference, blocks kept cohesive //Stolen from the "necessary" bin. body = replace(body,"ch","C"); //Although both versions of C work, I'm assuming capitalized, so no lowercas c's are allowed in the text body = replace(body,"accent","aksent"); body = replace(body,"exercise\n","eksersahyz\n"); body = replace(body,".once",".wuhns"); body = replace(body,"preface\n","prefis\n"); //special body = replace(body,"icise\n","uhsahyz\n"); body = replace(body,"rcise\n","ruhsahyz\n"); body = replace(body,".tacit\n",".tasit\n"); body = replace(body,"ciate\n","sheeeyt\n"); body = replace(body,"vate\n","vit\n"); //pulled from E section, might be a sign of things to come body = replace(body,"literate\n","literit\n"); body = replace(body,"ate\n","eyt\n"); body = replace(body,"cision\n","sizhuhn\n"); body = replace(body,"cise\n","sahys\n"); body = replace(body,"cist\n","sist"); body = replace(body,"uce\n","us\n"); body = replace(body,"uces\n","usez\n"); //z incorporated body = replace(body,"uced\n","usst\n"); //D's body = replace(body,"came\n","keym\n"); body = replace(body,"came","kamuh"); body = replace(body,"ct","kt"); //factual body = replace(body,"tual\n","Cual\n"); body = replace(body,".acid\n",".asid\n"); body = replace(body,".aci",".uhsi"); body = replace(body,"ierce\n","eers\n"); body = replace(body,"ince\n","ins\n"); //body = replace(body,".ance",".ahns"); body = replace(body,".trance",".trahns"); body = replace(body,"dance\n","dahns\n"); body = replace(body,"Cance","Cahns"); body = replace(body,"cance","cahns"); body = replace(body,"lance","lahns"); body = replace(body,"vance","vahns"); body = replace(body,"ance\n","uhns\n"); body = replace(body,"all\n","awl\n"); body = replace(body,".supp",".suhpp"); //just a general rule body = replace(body,"appa","apuh"); body = replace(body,"ppen","pen"); //double p's, might NOT be done body = replace(body,"pplet\n","plit\n"); body = replace(body,"pple\n","puhl\n"); body = realReplace("QQQ",body,".supplement\n",".suhpluhment\n"); //special case body = replace(body,"ppl","puhl"); body = replace(body,"upp\n","uhp"); body = replace(body,"oppor","oper"); body = replace(body,"opp","uhp"); body = replace(body,"ypp","ip"); body = replace(body,"pp","p"); //Last ditch, should cover most before this body = replace(body,"tice\n","tis\n"); body = replace(body,"arice\n","eris\n"); body = replace(body,"orice\n","uhis\n"); body = replace(body,"cipice\n","suhpis\n"); //patch for precipice body = replace(body,"ipice\n","uhpis\n"); body = replace(body,".vice\n","vahys\n"); body = replace(body,"vice\n","vis\n"); body = replace(body,"ice\n","ahys\n"); //Long S. NOT sure about \n's body = replace(body,"egy\n","ijee\n"); //possibilities/strategies fix, I have now idea how the ended up "kiez" body = replace(body,"ity\n","itee\n"); body = replace(body,"ite\n","ahyt\n"); body = replace(body,"irst\n","urst\n"); body = replace(body,"ong\n","ong\n"); body = replace(body,"ull\n","ool\n"); body = replace(body,"cide\n","sahyd\n"); body = replace(body,"ide\n","ahyd\n"); body = replace(body,"ence\n","ens\n"); body = replace(body,"rend\n","rend\n"); //1.8.9 Pie- body = replace(body,"piety","pahyitee"); body = replace(body,".pier\n"," peer\n"); body = replace(body,".pie\n"," pahy\n"); body = replace(body,".pie",".pee"); body = replace(body,"ces\n","seez\n"); body = replace(body,"cez\n","seez\n"); //Incase of S->Z body = replace(body,"ce\n","s\n"); body = replace(body,"ci\n","sahy\n"); body = replace(body,"oy\n","oi\n"); body = replace(body,"ace\n","eys\n"); body = replace(body,".chull\n",".as\n"); body = replace(body,".chull",".uhs"); //Assoc- body = replace(body,"ely\n","lee\n"); //MUST BE LAST IN \N body = replace(body,".scie",".sahye"); //For Science! body = replace(body,"sciou","shuh"); //For Conscience! body = replace(body,"cious","shuhs"); //For Ithaca! body = replace(body,"scio","shuh"); body = replace(body,"scie","shuh"); body = replace(body,"ply\n","plahy\n"); body = replace(body,".by\n",".bahy\n"); body = replace(body,".my\n",".mahy\n"); body = replace(body,".die\n",".dahy\n"); body = replace(body,".dye\n",".dahy\n"); body = replace(body,".bye\n",".bahy\n"); //conflict body = replace(body,"hype","hahype"); body = replace(body,"hypo","hahypo"); body = replace(body,"hypn","hipn"); body = replace(body,"hyphen","hahyfuhn"); body = replace(body,"hyfen","hahyfuhn"); //ph->f body = replace(body,"yp","ip"); body = replace(body,"duct","duhkt"); body = replace(body,"stion","sCuhn"); //1.8.9.4 body = replace(body,"tion","Suhn"); //1.8 body = replace(body,"ssion","Suhn"); //1.8.6 body = replace(body,"sion","zhuhn"); body = replace(body,"cean","Suhn"); body = replace(body,"ture","Cur"); body = replace(body,"cies","seez"); //prophocies body = replace(body,"ciez","seez"); //s->z already done body = replace(body,"iew","yoo"); body = replace(body,".face",".feys"); body = replace(body,"face","feys"); //For- body = replace(body,".fore",".fohr"); body = replace(body,".for",".fohr"); //ore, as in fore, bore body = replace(body,"ore","ohr"); body = replace(body,"acen","eysuhn"); //Don't get complacent body = replace(body,"ician","ishuhn"); //musician body = replace(body,"cism","sizuhm"); //anglicanism body = replace(body,"cial","shul"); body = replace(body,".acq",".akw"); //might need refinement body = replace(body,"cque","ke"); body = replace(body,"acquaint","uhkweyeynt"); body = replace(body,"cing","sing"); //1.6.5 - odyssey test body = replace(body,"exce","ikse"); body = replace(body,"excit","iksahyt"); body = replace(body,"excis","eksahyz"); body = replace(body,"ici","isi"); //Sicily body = replace(body,"iec","ees"); //Piece/Peace -> Pees body = replace(body,"eac","ees"); body = replace(body,"ight","ahyt"); body = replace(body,"cep","sep"); body = replace(body,"cin","sin"); body = replace(body,".cit",".sit"); body = replace(body,"cip","sip"); body = replace(body,"cif","sif"); //NOT sure body = replace(body,"icc","ik"); body = replace(body,"icn","ikn"); body = replace(body,"sce","se"); body = replace(body,"sci","si"); body = replace(body,"scy","sahy"); //body = replace(body,"sco","sko"); body = replace(body,"cea","sea"); body = replace(body,"nci","nsi"); //might need refinement body = replace(body,"ncy","nsee"); body = replace(body,"cei","see"); body = replace(body,"cee","see"); body = replace(body,"cent","sent"); //odyssey body = replace(body,"it\n","it\n"); //Tacked on for suffix reasons body = replace(body,"ap\n","ap\n"); //starting with c body = replace(body,".cy",".sahy"); body = replace(body,".cir",".sur"); body = replace(body,".cid",".sahyd"); body = replace(body,".ci",".si"); body = replace(body,".cer",".sur"); body = replace(body,".ce",".se"); body = replace(body,"ck","k"); /* body = realReplace("QQQ",body,"C\n","k\n"); body = realReplace("QQQ",body,"ch\n","k\n"); */ body = replace(body,"sc","sk"); body = replace(body,"cy","see"); //1.4.3 - si->see body = replace(body,"ce","se"); body = replace(body,"ca","ka"); body = replace(body,"co","ko"); body = replace(body,"cu","ku"); body = replace(body,"ct","kt"); body = replace(body,"cl","kl"); body = replace(body,"cr","kr"); body = realReplace("QQQ",body,".c",".k"); //This can possibly leave lowercase c's in the text, although I think that all properly spelled words should be covered here. body = realReplace("QQQ",body,"c\n","k\n"); //to stop mischeif //END C'S //Not sure where to put this section //ss body = replace(body,"ss","s"); body = replace(body,".be\n",".bee\n"); body = replace(body,".maybe\n",".meybee\n"); //gh body = replace(body,"gha","gah"); //This section needs work body = replace(body,"gho","goh"); body = replace(body,"ought","awt"); body = replace(body,"though","thoh"); body = replace(body,"bough","bou"); body = replace(body,"cough","kof"); body = replace(body,"igh","ahy"); body = replace(body,"gh\n","\n"); body = replace(body,"gh","g"); //to, too, two - Just a quick patch for those three words, not a general solution to any problem I can see body = replace(body,".to\n",".too\n"); body = replace(body,".two\n",".too\n"); //q at end body = realReplace("QQQ",body,"q\n","k\n"); //w at end body = replace(body,".low\n",".loh\n");//special cases body = replace(body,".row\n",".roh\n"); body = replace(body,"ow\n","au\n"); //.sy body = replace(body,".syr",".suhr"); //Moved up to e-enders body = replace(body,".syr",".sir"); body = replace(body,".sly",".slahy"); body = replace(body,".lying\n",".lahying\n"); body = replace(body,".ly",".li"); //sz->siz - The coward's way out. I need to sit down and make this thing more cohesive body = replace(body,"sz\n","siz\n"); body = replace(body,"pie\n","pahy\n"); // NOT normal, aka special body = realReplace("qqq",body,".or",".awr"); body = realReplace("qqq",body,"y\n","ee\n"); body = realReplace("qqq",body,"ehee\n","ehy\n"); body = realReplace("qqq",body,"ahee\n","ahy\n"); body = realReplace("qqq",body,"eee\n","ey\n"); //fixing issues raised by y->ee as compared to other phonetics String[] temp = {"en","st","un","c","f","g","s","t",""}; body = replace(body,"ctable\n","kteybuhl\n"); //save the c's! for(int i = 0; i<temp.length;i++) if(temp.equals("c")) body = replace(body,"kable\n","eybuhl\n"); else body = replace(body,temp+"able\n","eybuhl\n"); body = replace(body,"able\n","uhbuhl\n"); //This one is either "eybuhl" for a few short words or "uhbuhl" for all others body = replace(body,"ble\n","buhl\n"); //x's body = replace(body,".xy",".zi"); body = replace(body,"xious","kSuhs"); //General fixer for suffixes //body = replace(body,"\n","\n"); //The annoying part is the hodge-podgeness of English. The only workable rout may be just to demand phonetic spelling in cases like "Tow" //Necessary --Moved down to make ease-of-use conversions easier body = replace(body,"th","T"); body = replace(body,"sh","S"); //body = replace(body,"ch","C"); //took some liberties here, capitalized the C to make room for the c->k/s conversion body = replace(body,"x","X"); //Consistency - x is really a compound character of ks. body = replace(body,"qu","ku"); //body = replace(body,"q","ku"); /* body = replace(body,"wa","ua"); //Unnecessary? I think not! I'm not sure why, but no. body = replace(body,"we","ue"); body = replace(body,"wi","ui"); body = replace(body,"wo","uo"); body = replace(body,"wu","uu"); */ body = replace(body,"w","u"); //exception catcher if(debug_end_e){ body = replace(body,"e\n","Q\n"); //Just for debugging body = replace(body,".TQ",".Te"); body = replace(body,".bQ",".be"); body = replace(body,".seQ",".seee"); body = replace(body,".mQ",".me"); body = replace(body,"eQ\n","ee\n"); body = replace(body,"Qy\n","ey\n"); body = replace(body,".hQ",".he"); body = replace(body,".shQ",".she"); } return body; } private static String replace(String body, String target, String sub){ return realReplace("",body,target,sub); } private static String realReplace(String sofar, String body, String target, String sub) { int target_size = target.length(); int sub_size = sub.length(); /* if((min<Count++)&&(max>Count)) Targets+= target+"_"; */ if(Counting) { Count++; if(target.equals("w")) System.out.println("Replaces Run: "+Count); } //As of 1.8.8.1, '.' and '\n' are only codes for ' '. Spaces will be added before and after every \n, as well as after every period, then removed at the end. //'.'==' ' if(target.startsWith(".")) return realReplace(sofar, body,(" "+target.substring(1,target_size)),(" "+sub.substring(1,sub_size))); else if(target.endsWith("\n")) return realReplace(sofar, body,(target.substring(0,target_size-1)+" "),(sub.substring(0,sub_size-1)+" ")); //space substitution if(target.endsWith(" ")) if(sofar.length()<=2){ //that took longer than it should have. Anyone who can suggest improvements is welcome to try. if(target.equals("y ")) System.out.println(target); if((!sofar.contains("z"))&&(!sofar.contains("l"))){ //I think contains() covers it. It saves time over endsWith() if it stops unnecessary calls to realReplace(), as long as it doesn't cut out possible permutations if(!sofar.contains("i"))// s->z if((target_size>=2)&&(target.charAt(target_size-2)!='s')&&(target.charAt(target_size-2)!='z')) //Double-checking s/z if(target.charAt(target_size-2)=='e') if((sub_size>=2)&&(sub.charAt(sub_size-2)=='e')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s "),(sub.substring(0,sub_size-1)+"z ")); else if((sub_size>=2)&&(sub.charAt(sub_size-2)=='y')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s "),(sub.substring(0,sub_size-1)+"z ")); //s->z else body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s "),(sub.substring(0,sub_size-1)+"ez ")); //s->z else if(((target_size>=2)&&(target.charAt(target_size-2)=='y'))||(target_size<3)) //bug stopper if((sub_size>=2)&&(sub.charAt(sub_size-2)=='e')) body = realReplace(sofar+"z",body,(target.substring(0,target_size-2)+"ies "),(sub.substring(0,sub_size-1)+"z ")); else body = realReplace(sofar+"z",body,(target.substring(0,target_size-2)+"ies "),(sub.substring(0,sub_size-1)+"iez ")); //s->z else body = realReplace(sofar+"z",body,(target.substring(0,target_size-1)+"s "),(sub.substring(0,sub_size-1)+"z ")); //s->z /* //y body = realReplace("qqq",body,"ay ","ey "); //stopgap, might want to revisit body = replace(body,"ey ","ey "); body = realReplace("qqq",body,"oy ","oi "); body = realReplace("qqq",body,"uy ","ahy "); body = realReplace("qqq",body,"y ","ee "); //might need generalized in replace() body = replace(body,"ty","tahy"); */ //ly, focus on y as of 1.7.4.3 - It might need some work if(target.equals("sly ")) //special case body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); else{ //ly if((target_size>=5)&&(target.substring(target_size-5,target_size-1).equals("able"))) body = realReplace(sofar+"l",body,(target.substring(0,target_size-2)+"y "),(sub.substring(0,sub_size-4)+"lee ")); //ably else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='y')) if((sub_size>=3)&&(sub.substring(sub_size-3,sub_size-1).equals("ee"))) body = realReplace(sofar+"l",body,(target.substring(0,target_size-3)+"ily "),(sub.substring(0,sub_size-3)+"uhlee ")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-2)+"ily "),(sub.substring(0,sub_size-2)+"uhlee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='o')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"pily "),(sub.substring(0,sub_size-1)+"uhlee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"tily "),(sub.substring(0,sub_size-1)+"uhlee ")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); //y if((target_size>=2)&&(target.charAt(target_size-2)=='a')) //might need work body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"y "),(sub.substring(0,sub_size-2)+"ey ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"y "),(sub.substring(0,sub_size-1)+"y ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='o')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"y "),(sub.substring(0,sub_size-1)+"i ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='u')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"y "),(sub.substring(0,sub_size-2)+"ahy ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"py "),(sub.substring(0,sub_size-1)+"ee ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"y",body,(target.substring(0,target_size-1)+"ty "),(sub.substring(0,sub_size-1)+"ee ")); else body = realReplace(sofar+"l",body,(target.substring(0,target_size-1)+"ly "),(sub.substring(0,sub_size-1)+"lee ")); //might not be needed } if((!sofar.contains("g"))&&(!sofar.contains("i"))&&(!sofar.contains("r"))){ //covers multiple if((!target.endsWith("g "))&&(!target.endsWith("gs "))&&(!target.endsWith("gz "))) //leave no base uncovered if((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("ie"))) body = realReplace(sofar+"g",body,(target.substring(0,target_size-3)+"ying "),(sub.substring(0,sub_size-1)+"ing ")); //replacing 'ie' before gerund else if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-2)+"ing "),(sub.substring(0,sub_size-1)+"ing ")); //removing 'e' else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ping "),(sub.substring(0,sub_size-1)+"ing ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ting "),(sub.substring(0,sub_size-1)+"ing ")); else if((!target.endsWith("gs "))&&(!target.endsWith("gz "))) //no "ing\n" or s\z at end body = realReplace(sofar+"g",body,(target.substring(0,target_size-1)+"ing "),(sub.substring(0,sub_size-1)+"ing ")); //no e, presumably ends in consonant if((!sofar.contains("a"))&&(!sofar.contains("d"))) //ish if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"pish "),(sub.substring(0,sub_size-1)+"ish ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"tish "),(sub.substring(0,sub_size-1)+"ish ")); else if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ed")))||(target_size<3)) body = realReplace(sofar+"i",body,(target.substring(0,target_size-1)+"ish "),(sub.substring(0,sub_size-1)+"ish ")); if(!sofar.contains("a")) //able if((target_size>=2)&&(target.charAt(target_size-2)=='p')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"pable "),(sub.substring(0,sub_size-1)+"uhbuhl ")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"uhbuhl ")); } else if((target_size>=2)&&(target.charAt(target_size-2)=='t')){ body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"table "),(sub.substring(0,sub_size-1)+"uhbuhl ")); body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"uhbuhl ")); } else if(((target_size>=3)&&(!target.substring(target_size-3,target_size-1).equals("ly")))||(target_size<3)) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"uhbuhl ")); else if(target.equals("fly")||target.equals("unfly")) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"uhbuhl ")); else if(((target_size>=4)&&(target.substring(target_size-4,target_size-1).equals("ing")))||(target_size<4)) body = realReplace(sofar+"a",body,(target.substring(0,target_size-1)+"able "),(sub.substring(0,sub_size-1)+"eybuhl ")); } if((!sofar.contains("g"))&&(!sofar.contains("d"))){ //covers multiple if(target_size>=2) //d at end if(target.charAt(target_size-2)=='e') if((target_size>=3)&&(target.charAt(target_size-3)=='c')) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d "),(sub.substring(0,sub_size-1)+"st ")); else body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d "),(sub.substring(0,sub_size-1)+"ed ")); //NOT st else if(target.charAt(target_size-2)=='s') body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ed "),(sub.substring(0,sub_size-1)+"ed ")); else if((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("se"))) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"d "),(sub.substring(0,sub_size-1)+"ed ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ped "),(sub.substring(0,sub_size-1)+"ed ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ted "),(sub.substring(0,sub_size-1)+"ed ")); else if((target.charAt(target_size-2)!='s')||((target_size>=3)&&(target.substring(target_size-3,target_size-1).equals("ss")))) body = realReplace(sofar+"d",body,(target.substring(0,target_size-1)+"ed "),(sub.substring(0,sub_size-1)+"ed ")); //er if(!sofar.contains("r")) if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"r "),(sub.substring(0,sub_size-1)+"er ")); //removing 'e' else if((target_size>=2)&&(target.charAt(target_size-2)=='p')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"per "),(sub.substring(0,sub_size-1)+"er ")); else if((target_size>=2)&&(target.charAt(target_size-2)=='t')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"ter "),(sub.substring(0,sub_size-1)+"er ")); else body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"er "),(sub.substring(0,sub_size-1)+"er ")); } /* //ate, not bothering with fobiddances - Never mind if((target_size>=2)&&(target.charAt(target_size-2)=='e')) body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"r\n"),(sub.substring(0,sub_size-1)+"er\n")); //removing 'e' else body = realReplace(sofar+"r",body,(target.substring(0,target_size-1)+"er\n"),(sub.substring(0,sub_size-1)+"er\n")); */ //Why do these need to be dealt with here? //Because these permuations need to be available to figure out which \n grammars to apply //ed, ish, ly, ing, able, edly, ishly, ably, lying, eding, abling //Dirty method - add a recursion counter to replace() //6 max - ed ish ly ing able z //ablingly, lyingly - 3 //ablinger //s-z, ly-l, ing-g, d-d, ish-i, able-a //everything abides i, nothing abides s/l //nevermind, not much likes i either //a allows l/s/d, //a forbids a, i //d forbids d, i //g forbids d, g, i, a //i forbids s, g, i, a //er-r //r forbids g, i, a, r //r is forbidden by s, l, g, d //y-y //Not messing with forbidding now (1.8.8.2) //I think that forbiddance is total - no forbidden suffixes at any point before } } for(int i = 0; i<=body.length()-target_size;i++) { if(body.substring(i,i+target_size).equals(target)) { body = body.substring(0,i)+sub+body.substring(i+target_size,body.length()); i+=(sub_size-target_size); } } return body; } } Edited January 24, 2012 by Kurkistan 1 Link to comment Share on other sites More sharing options...
Recommended Posts