What is a promise in Javascript?

Question

Asked: 2020-07-27 08:14:02 +0800 CST 2020-07-27 08:14:02 +0800 CST 2020-07-27 08:14:02 +0800 CST

How to extract data from a table with scraping in Ruby on Rails?

772

I plan to take a Ruby on Rails scraping mooc , to get the most out of the course I am reading some tutorials on how to scrape and anticipate possible tasks.

I created a web page on github that simulates the data I want to extract:

https://rrg1459.github.io/extractData/

And I have the following helper called clientes_helper.rbthat extracts the data correctly:

module ClientesHelper

    def consulta_cliente
        require 'open-uri'
        require 'nokogiri'
        url = "https://rrg1459.github.io/extraerDatos/"
        begin
            @hayInternet = true
            doc = Nokogiri.HTML(open(url))

            inline_script = doc.search('//tr')
            inline_script.each do |script|
                linea = script.text.strip

                linea = linea.gsub("\t","")
                linea = linea.gsub("\n","")
                linea = linea.gsub("</table>","")
                linea = linea.gsub("<tr>","")
                linea = linea.gsub("<td>","")
                linea = linea.gsub("</td>","")
                linea = linea.gsub("<b>","")
                linea = linea.gsub("</b>","")
                linea = linea.gsub("</tr>","")
                linea = linea.gsub("<td align=\"left\">","")
                linea = linea.gsub("  ","")
                linea = linea.gsub("<font color=\"#00387b\">","")

                if /ID/ =~ linea 
                    @id_cliente = linea.split(':')[1]
                elsif /Nombre/ =~ linea
                    @cliente_nombre = linea.split(':')[1]
                elsif /stado/ =~ linea
                    @cliente_estado = linea.split(':')[1]
                elsif /REGULAR/ =~ linea
                    @cliente_regular = linea.split(':')[1]
                elsif /Dirección/ =~ linea
                    @direccion_cliente = linea.split(':')[1]
                end
            end
        rescue
            @hayInternet = false
        end

    end
end

But it looks horrible and it's not the best technique, I think it might be offensive to some experienced eyes and I apologize in advance.

I am quite aware of the Ruby on Rails principle of "convention over configuration" and I try to stick to this principle and fail to make it look clean, spare, precise.

And the truth is, I'm more tangled up than a dog eating gum.

Can someone guide me on what is the best practice to extract this data in an elegant Ruby-like way.

Thank you in advance for any help you can give me.

1 Answers

Voted

Alter Lagos · Answer 1 · 2020-07-29T14:51:10+08:00

I think the most important improvement you can make to your code is to use nokogiri correctly.
By searching //tryou are returning and iterating through 14 tags that it found in your HTML, which is inefficient, since you iterate through more tags than you should and on top of that you have to remove all those line breaks and html tags (which by the way, you don't are necessary, since it .textremoves them).

You'd be better off directly accessing the 4 tr's you need via their xpath :

inline_script = doc.search('/html/body/table/tbody/tr/td/table/tbody/tr[5]/td/table/tbody/tr[2]/td/table/tbody/tr')
inline_script[0].search('td[2]').text # "D-123456789"
inline_script[1].search('td[2]').text # "PEDRITO DE LOS PALOTES"
inline_script[2].search('td[2]').text # "SI"
inline_script[3].search('td[2]').text # "CALLE CARCA DE ALGO, AL LADO DE UNA ESQUINA"

How to extract data from a table with scraping in Ruby on Rails?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?