What is a promise in Javascript?

Question

Agustin G.

Asked: 2020-03-22 06:03:52 +0800 CST 2020-03-22 06:03:52 +0800 CST 2020-03-22 06:03:52 +0800 CST

Automate document loading, go from word to excel or similar

772

People, I hope you understand that this is a special situation. It is a question that does not follow the rules, but right now the world is a mess and I really need if someone can give me a hand with this. Fill me with negatives if you want but please don't close the question.

At my work we are manually uploading some files referring to the complaints reported by people regarding COVID-19. The issue is that this virus is growing exponentially, which is why our work is also growing, and we are behind in statistics, which is partly why the number of known cases is lower than it actually is. Without further ado, I'll show you what I have:

We receive a .doc file (ask in the chat before asking the question here, and they asked me the version of Word. Actually we work with different versions, I have the latest version at home because I am a student and MS gives it away and always updates it , but at work I have 2010, I think the version may not matter since I want to work with text, I could even copy it and take it to a notepad for example)

This .doc file has about 100 texts that have more or less this form

Illicit Investigation – Illicit Port (COVID-19 Coronavirus) – Senior Female 03/19 – PU. 81027 – High 07:06 am – CP Campana; 05:00 a.m. She received a call to XXXX (Dda. XXXX 1000), realizing that her neighbor XXX hers, returned from Brazil the day before, not complying with the isolation protocol. Personal proceeded to provide the corresponding telephone numbers for such cases, giving notice to Health personnel.

Each of these texts must be passed to a table in a database, which has about 20 fields. I would like to at least be able to get the 3 data that I put in bold, which are the date, the Urgent Part number, which is always after PU.and the time, which is always the one after the word alta(since there may be other times in the paragraph, anyway the time I'm looking for is the first one that appears, so you can have those two criteria, or the first appearance, or after "high")

If you don't know how to do it but you know tools that can do it, it would also be of great help to me

The thing is that we are passing them by hand, and this virus is growing by leaps and bounds with it the complaints, and we are not being able to upload this.

The idea is that each new text (could be identified by the appearance of the string PU.) creates a new row (it can be in an excel or access table, it doesn't matter) and place those 3 fields.

I hope you can understand an exceptional question at an exceptional time.

2 Answers

Voted

Mauricio Contreras · Answer 1 · 2020-03-22T08:38:19+08:00

One approach, using JavaScript (NodeJS) is to create a script scriptthat does an analysis of the Word document ( .docx) and writes the result to an Excel workbook ( .xlsx).

I have implemented this code very quickly to try to provide a solution to the problem, I understand that it is for an emergency.

You need:

I don't do case sensitive checks, this is a first approximation to a possible solution.

Using the module word-text-parserwe get the paragraphs of the Word document.

We filter each paragraph to keep those that exclusively contain the strings: PU. and High

We also create a regular expression to get the date, which we assume is always in the following format: XX/XX . Where XX are numeric digits from 00 to 99. (It is not validated if a date is valid, therefore 03/12 will be valid as well as 12/03).

Then we separate each paragraph into words, and we search for the words we want according to the aforementioned criteria.

Each value is placed in an object with the following structure:

{ PU: <cadena>, Alta: <cadena>, Fecha: <cadena>}

A list of these values is saved and an Excel workbook is generated using the ExcelJS library.

The final code may look like this:

const path = require('path');
const wordParser = require('word-text-parser');
const ExcelJS = require('exceljs');


/* La siguiente función se encarga de crear un documento de Excel
* a partir de una lista de datos recibida como parámetro
*
* @input dataList [Array]: Array con datos: { PU: <cadena>, Alta: <cadena>, Fecha: <cadena>}
*/
const createWorkBook = function(dataList) {
  if(!dataList || !Array.isArray(dataList) || !dataList.length) return;
  let wb = new ExcelJS.Workbook();
  wb.created = new Date();
  wb.creator = 'Someone';
  wb.views = [
    {
      x: 0, y: 0, width: 10000, height: 20000,
      firstSheet: 0, activeTab: 1, visibility: 'visible'
    }
  ]
  //creamos la hoja del libro
  let dataSheet = wb.addWorksheet('Data');

  //20 campos por registro, es lo que estableces en tu pregunta
  // los 3 datos van en las columnas 3, 5 y 7. con PU, alta y fecha respectivamente.

  // creamos la cabecera de datos
  let columnHeaders = [];
  for(let i = 0; i < 20; i++) {
    let header = {header: `Dato${i}`, key: `Dato${i}`, width: 30};
    if(i === 3) {
      header.header = 'PU';
      header.key = 'PU';
    }
    if(i === 5) {
      header.header = 'Alta';
      header.key = 'Alta';
    }
    if(i === 7) {
      header.header = 'Fecha';
      header.key = 'Fecha';
    }
    columnHeaders.push(header);
  }
  dataSheet.columns = columnHeaders;

  // Recorremos la lista y almacenamos los registros en la hoja
  dataList.forEach(data => {
    dataSheet.addRow({PU: data.PU, Alta: data.Alta, Fecha: data.Fecha});
  });

  // Escribimos el archivo resultante:
  wb.xlsx.writeFile(path.join(__dirname,'test.xlsx'))
  .then(() => {
    console.log('Done');
  })
  .catch(error => {
    console.error(`Algo salió mal: ${error.message}`);
  });
}

/*
* La siguiente función analiza los párrafos capturados del archivo de Word
* y devuelve una lista con los datos obtenidos de los mismos
* @input resultsList [Array]: Array con cadenas correspondientes a cada párrafo del archivo
* @output data [Array]: Array con datos: { PU: <cadena>, Alta: <cadena>, Fecha: <cadena>}
*/
const parseResult = function(resultsList) {
  if(!resultsList || !Array.isArray(resultsList) || !resultsList.length) return;

  // filtramos los párrafos que contengan "PU." y "Alta"
  const filteredParagraphs = resultsList.filter(p => p.includes('PU.') && p.includes('Alta'));

  // Array que almacenará los datos extraídos
  const data = [];

  // Expresión regular para evaluar la fecha.
  const regexFecha = /^\d\d\/\d\d$/i

  // recorremos cada párrafo
  filteredParagraphs.forEach(p => {
    //creamos el objeto para almacenar los datos
    let value = {
      PU: '',
      Alta: '',
      Fecha: ''
    }

    // separamos cada párrafo en palabras
    let splitedParagraph = p.split(' ');

    //analizamos cada palabra y capturamos los datos
    splitedParagraph.forEach((word, index) => {
      if(word.includes('PU.')) {
        value['PU'] = word.split('.')[1];
      }
      if(word === 'Alta') {
        value['Alta'] = splitedParagraph[index + 1];
      }
      if(regexFecha.test(word)) {
        value['Fecha'] = `${word}/2020`;
      }
    });

    // almacenamos el resultado en el array
    data.push(value);
  });

  // mostramos los datos por consola
  console.log(data);

  // llamamos al procedimiento para crear el libro de Excel a partir de los datos
  createWorkBook(data);
}

// nombre del archivo a analizar
const fileToParse = path.join(__dirname, 'demo.docx');

// llamamos al analizador del archivo
wordParser(fileToParse, parseResult);

I have a repository of this implemented on Github , there is 1 test Word file ( demo.docx), which has a random (unstructured) structure.

I hope that you can make progress with this. Any questions ask.

Sal · Answer 2 · 2020-03-22T07:43:04+08:00

Sal

2020-03-22T07:43:04+08:002020-03-22T07:43:04+08:00

With LibreOffice you can convert a .doc to .txt by command line:

soffice --headless --convert-to txt:Text archivo.doc

From there you can get what you need with regular expressions either with the same command line or with another programming language. By first dividing the text by the separator " - ". Even the data you need could be extracted from the database depending on the version.

2

Automate document loading, go from word to excel or similar

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?