Creating a dictionary

Viewing 4 reply threads
  • Author
    Posts
  • May 8, 2022 at 1:30 AM #47234

    Goutalco
    Participant

    Hello everybody,
    I like to create a dictionary database for classical languages based on some scanned dictionaries (19th and 20th century) that are available in the form of PDF files. The PDF files are not searchable and due to the mixing of several alphabets and diacritical characters present in these files ocr is not reliable (creating searchable PDF files would be very time consuming).

    The database should work with user input based on the roots of the language.

    Example for one PDF file. Let’s call the dictionary DIC and let ABC be a root.

    All entries in DIC pertaining to the Root ABC are present on a certain page of DIC.

    I have a list of all possible roots and the page number pertaining to each root, for example

    ABC: 23
    ACE: 51`
    BBD: 134`

    I like to create a Tap Forms 5 document which has one of the following two options:

    When the user searches for Root ABC, either

    1. Tap Forms 5 tells the Preview or any other PDF-Application to open file to open the dictionary DIC and show page 23,
    2. or a Form which contain all images of DIC opens record # 23.

    For performance reasons (the PDF files are quite large – several GB) I’am inclined to opt for the second option.

    So I have to create a form with a field that is able to accommodate photos obtained from the pages of DIC.
    My question is:
    1. How can I import thousands of photos into a Tap Form 5 form
    2. and how can I search for roots?

    The answer to the second question is probably connected to the Prompter class of the JavaScript API. But how can I import all the photos inside a Tap Forms 5 document?

    Cheers

    May 8, 2022 at 11:39 AM #47235

    Brendan
    Keymaster

    Hi Goutalco,

    Do you have a CSV file that contains references to the photos? You can use the Import Records function to import a CSV file that has a reference to a photo filename. Then you can also specify a folder that contains the photos. Tap Forms will import the CSV file with the data along with the photos.

    You can search just by typing into the Search field. You can also create Saved Searches if you search for the same value all the time.

    But I’m not really sure about this part:

    Tap Forms 5 tells the Preview or any other PDF-Application to open file to open the dictionary DIC and show page 23,

    Tap Forms can view a PDF file right inside without having to load another program to display it. Or you could setup a Website Address field which when clicked will launch whatever app you have that displays PDF files (Preview, Acrobat, etc.). You could even customize the address based on values stored in other fields:

    For example:

    file:///Users/brendan/Documents/Some-PDF-[Root].pdf

    By putting [Root] in the URL, when you click on the website button, Tap Forms will inject the value from the Root field into the URL and it will open that file.

    Thanks,

    Brendan

    May 8, 2022 at 2:17 PM #47236

    Goutalco
    Participant

    Thank you Brendan,

    I converted all my dictionaries to images and using your hint I successfully imported these images to my document via a CSV file. This part was quite straightforward.

    Now I like to create a search inside a form script via the Prompter class and since my understanding of JavaScript is at best rudimentary I really appreciate any comment to the script that I was able to generate using the help file.

    
    var root;
    
    var user_input = function printOut(continued) {
    	if (continued == true) {
    	
    		var page_for_root;
    	       // Getting the pages of the dictionary
    		var pages = form.getRecords();
    		
    		// Getting the id of the field that takes the first root present on a page
    		var root_1_id = 'fld-e724350c999d4fddafc12d258d407a3c';
    
    		// Getting the id of the field that takes the last root present on a page
    		var root_2_id = 'fld-a01b915c72904b2caba68eb8c2657055';
    		
    		for (var index = 0, count = pages.length; index < count; index++){
    			// Getting the first root present the current page
    			var root_1 = pages[index].getFieldValue(root_1_id);
    			// Getting the last root present the current page
    			var root_2 = pages[index].getFieldValue(root_2_id);
    			
    			// Compare the root the user is searching for to root_1 and root_2
    			var compare_1 = root_1.localeCompare(root, 'ar')
    			var compare_2 = root_2.localeCompare(root, 'ar')
    
    		       // Try to get the first page that contains root and 
                           // leave the loop in this case
    			if (compare_1 <= 0 && 0 <= compare_2) {
    				page_for_root = pages[index];
    				break;	
    			}			
    		}
    		
    		// If there is at least one page that contains root, show the first one
    		if (!(page_for_root === undefined)) {
        		        form.selectRecord(page_for_root);
      		} else {
      			console.log("Root not found");
      		}   
       } else {
              console.log("Cancel button pressed.");
       }
    }
    	
    function Search_For_Root() {
    		
    	let prompter = Prompter.new();
    	prompter.cancelButtonTitle = 'Cancel';
    	prompter.continueButtonTitle = 'OK';
    	prompter.addParameter('Root: ', 'root').show('Enter a root', user_input);
    
    }
    
    Search_For_Root();
    

    Thank you for your patience.

    Cheers

    • This reply was modified 2 years, 6 months ago by Goutalco.
    Attachments:
    You must be logged in to view attached files.
    May 10, 2022 at 7:18 PM #47271

    Goutalco
    Participant

    Hello everybody,

    Since the number of dictionaries is significant the import of images for every page of the dictionary blows up my Tap Forms 5 document to a size > 40 GB. So I opted for the second option, i.e. searching for a root and opening the dictionary as a PDF file.

    The solution suggested by Brendan by setting up a Website Address field which when clicked will launch the default PDF Viewer was not working for me. I got the following Error:

    
    The application “Tap Forms 5” does not have permission to open “Test.pdf.”
    

    So I found another solution which opens the PDF Viewer via a form script and tells the default PDF viewer to open the dictionary on the page that contains the term searched by the user.

    In case anybody is confronted with a similar problem, i.e. searching for alphabetically arranged items in PDF files, I share my script.

    
    var root;
    
    var user_input = function Show_Page(ok) {
    	if (ok == true) {		
    		// Variable for the record searched by root
    		var page_for_root;
    	    
    	        // Getting the pages (records) of the dictionary
    		var pages = form.getRecords();
    		
    		// Getting the id of the field that takes the 
    		// first root present on a page
    		var root_1_id = 'fld-8ab2fc88535e49dbb60df845f4157590';
    
    		// Getting the id of the field that takes the last root      
                    // present on a page
    		var root_2_id = 'fld-fa1b6724b80f4780b1779ab4ef098c75';
    
    		
    		// Looping through the pages
    		for (var index = 0, count = pages.length; index < count;
                    index++){
    			
    			// Getting the first root present the current page
    			var root_1 = pages[index].getFieldValue(root_1_id);
    			
    			// Getting the last root present the current page
    			var root_2 = pages[index].getFieldValue(root_2_id);
    			
    			// Compare root to root_1 and root_2
    			var compare_1 = root_1.localeCompare(root, 'ar')
    			var compare_2 = root_2.localeCompare(root, 'ar')
    
    			// Try to get the first page that contains root 
                            // and leave the loop in this case
    			if (compare_1 <= 0 && 0 <= compare_2) {
    				page_for_root = pages[index];
    				break;	
    			}			
    		}
    		
    		// If there is at least one page that 
                    // contains root, show the first one
    		if (!(page_for_root === undefined)) {
    		      // Getting the URI for the dictionary
                          // file (with appropriate page number) 
                          // stored in DevonThink
    		      var devon_id = 'fld-16ada2521a864aae8646222a6661a422';
        		      var devon = page_for_root.getFieldValue(devon_id);   		
        		      // Open the dictionary on the the requested page 
        		      Utils.openUrl(devon);
        		      // Go to the page (record) of the dictionary 
                          // in Tap Forms 5
        		      form.selectRecord(page_for_root);
      		} else {
      			console.log("Root not found");
      		}   
       } else {
              console.log("Cancel button pressed.");
       }
    }
    
    	
    function Search_Dic() {
    		
    	let prompter = Prompter.new();
    	prompter.cancelButtonTitle = 'Cancel';
    	prompter.continueButtonTitle = 'OK';
    	prompter.addParameter('Root: ', 'root')
            .show('Dictionary', user_input);
    }
    
    Search_Dic();
    
    May 16, 2022 at 4:49 AM #47344

    Goutalco
    Participant

    Hello everybody,

    I successfully set up my dictionary document and linked the entries (via URI using the Web Site field) to a set of scanned dictionaries located in a DevonThink database (adding the page corresponding to the entries to the URI).

    Now I have to create my own dictionary. In the final state the content of a dictionary will be in several languages often using non latin alphabets like Ancient Greek sometimes with right to left writing direction, e.g. Semitic languages like Arabic, Hebrew, and Syriac. It is possible that some content involves mixed text that contains all of the above mentioned languages. For readability it is necessary that each text chunk in a certain language uses its own font that is suited to display certain features of the language more distinctly than fonts that cover a vast amount of Unicode glyphs like Noto. Some fields will contain mathematical notation mostly symbols used in logic and set theory.

    If the Markdown field of TapForms 5 would accept extended syntax (MathJax) and HTML (to display right to left content) and rendering of text could be changed via CSS this post would be superfluous.

    Since this is not the case I have to look for other solutions.

    A simplified subset of my Tap Forms document is structured as follows:

    1. A Lexemes form: contains the entries of the dictionaries.
    2. A Meanings form: each lexeme has many meanings and each meaning one lexeme (one to many relationship).
    3. A References form: each meaning has many references and each reference has many meanings (many to many relationship). References are passages from sources, i.e. books, websites, etc, pertaining to the meaning.

    For flexibility reasons and because the order of references pertaining to a meaning plays a role, I have realized the many to many relationship form Meaning to References via two one to many relationships: Meanings 1 -> n MeRef m <- 1 References.

    Now I like to display in each Lexemes record:

    1. the meanings that are related to the record
    2. and the references that are related to each meaning.

    Example:

    • lexeme 1
    – meaning 11
    + reference 111
    + reference 112

    – meaning 12
    + reference 121
    + reference 122

    • lexeme 1

    Since the text chunks that make up meaning ij and reference ijk consists of an unpredictable mix of different languages with different writing directions neither the field types Text, Note, nor Markdown are an option.

    I thought I could sync my Tap Forms document to a Apache CouchDB database and than retrieve all the meaning ij and reference ijk related to a certain lexeme, process that information in a markdown file rendering the languages and math appropriately, convert the markdown file to PDF, and display the PDF file in Lexemes form for each record – all in an automated manner.

    But I’am at lost how to deal with the Apache CouchDB database related to my Tap Forms document and exporting every lexeme with its meaning and references seems to be time consuming and error prone. Has anybody an idea to solve the problem?

    Cheers,

    Goutalco

Viewing 4 reply threads

You must be logged in to reply to this topic.