What is a promise in Javascript?

Question

Asked: 2020-03-18 05:23:58 +0800 CST 2020-03-18 05:23:58 +0800 CST 2020-03-18 05:23:58 +0800 CST

Why doesn't cout show vowels with tildes or "ñ" with gcc 4.9.4?

772

I have no idea why this happens. Whenever it processes the characters of a string, and it stringhas vowels with accents or ñtransforms them and does not display properly.

4 Answers

Voted

Joaquin Pereira · Answer 1 · 2020-03-18T06:41:55+08:00

This is due to the locale that your program is running with; An example to locate would be:

#include <iostream>
using namespace std;
setlocale(LC_ALL, "es_ES");


int main()
{
  cout << "áéíóú\n";
  return 0;
}

You can see more information about this at:

Location functions in C

Trauma · Answer 2 · 2020-03-18T11:34:38+08:00

For quick understanding:

std::string( "ab" ).size( ); => 2;

std::string( "ññ" ).size( ); => 4;

You cannot display UTF-8 characters as ascii bytes .

The only solution you have is to check, 1 to 1, that the characters are valid in ASCII (7 bits). If any character does not meet that rule, you would have to return more than 1 byte .

All UTF-8 characters have bit 8 set to 1, so the check is simple:

if( character & 128 ) {

If you find any character that meets the above, you are facing UTF-8 .

Before characters of this type, you have to use some library to extract it and convert it into a string, to display the latter.

Keep in mind that you can find more than one UTF-8 in a row , so you can't take the easy way of adding characters to an auxiliary string as long as the check is successful. You may also run into invalid UTF-8 sequences .

I think Windows provides functions for these things. On Linux, you can use ICU

EDIT

I never had the need to extract individual characters from a ::std::string... until reading this question ;-)

After a few unexpected annoyancestemplate< > , I made this one that allows you to iterate over the individual characters of UTF-8 strings, whether they're in a const char *VAR="...", or ::std::string( "..." ). It's not the coolest thing in the world, but it illustrates the process of checking if a character is UTF-8 or not, and how to treat them depending on the width of the character. It does not take into account possible errors in the UTF-8 encoding, it is only for training purposes:

// utf8iterator.hpp

#ifndef UTF8ITERATOR_HPP
#define UTF8ITERATOR_HPP

#include <cstddef>

template< typename T > struct utf8iterator {    
  T ptr;
  ::size_t size;
  char bytes[5];

  utf8iterator( const T &p ) :
    ptr( p ),
    size( 0 )
  {
    bytes[4] = 0;
  }
  utf8iterator &operator=( const T &iter ) {
    ptr = iter;
    size = 0;
    return *this;
  }

  bool operator==( const utf8iterator< T > &other ) const noexcept { return ptr == other.ptr; }
  bool operator!=( const utf8iterator< T > &other ) const noexcept { return ptr != other.ptr; }

  ::size_t calculateSize( ) const {
    if( ( *ptr & 248 ) == 240 ) {
      return 4;
    } else if( ( *ptr & 240 ) == 224 ) {
      return 3;
    } else if( ( *ptr & 224 ) == 192 )
      return 2;

    return 1;
  }
  utf8iterator &operator++( ) {
    if( size ) {
      ptr += size;
      size = 0;
    } else
      ptr += calculateSize( );

    return *this;
  }
  utf8iterator operator++( int ) {
    utf8iterator tmp( *this );

    if( size ) {
      ptr += size;
      size = 0;
    } else
      ptr += calculateSize( );

    return tmp;
  }
  void update( ) {
    ::size_t c;
    T iter( ptr );

    size = calculateSize( );

    for( c = 0; c != size; ++c ) {
      bytes[c] = *iter;
      ++iter;
    }

    if( size != 4 )
      bytes[size] = 0;
  }
  operator const char *( ) {
    if( !size )
      update( );

    return bytes;
  }
};

#endif

A small test/example program, showing its use:

// main.cpp

#include <iostream>
#include <string>

#include "utf8iterator.hpp"

int main( void ) {
  const char *test = "abcdeññ";
  std::string str( test );

  utf8iterator< const char * > charIter( test );
  utf8iterator< std::string::iterator > strIter( str.begin( ) );

  while( *charIter ) {
    std::cout << charIter << ": ";
    std::cout << charIter.size << "\n";
    ++charIter;
  }

  while( strIter != str.end( ) ) {
    std::cout << strIter << ": ";
    std::cout << strIter.size << "\n";
    ++strIter;
  }

  std::cout << std::endl;

  return 0;
}

After compiling it with g++ -I . -std=c++11 -Wall -pedantic main.cpp, it shows the following result:

a: 1
b: 1
c: 1
d: 1
e: 1
ñ: 2
ñ: 2
a: 1
b: 1
c: 1
d: 1
e: 1
ñ: 2
ñ: 2

Properly displays individual characters, both in char *and std::string, regardless of the bytes they occupy.

Angel Angel · Answer 3 · 2020-03-19T01:01:44+08:00

I don't know if you solved this issue but as I see comments like this:

I used gnu++11's std::locale, then cout.imbue( locale( "" ); it still shows me the characters incorrectly...

You can make use of the following to display it how you want:

#include <iostream>
#include <locale>
#include <string>

using namespace std;

int main() {
    // your code goes here

    ios_base::sync_with_stdio(false);
    wcout.imbue(locale("en_US.UTF-8"));

    for (auto const&t : wstring (L"áéíóú")){
        wcout << t;
    }
    return 0;
}

testIdeone

Info:

wstring

wstring (L"áéíóú")

wcout

wcout << t;

sync_with_stdio

ios_base::sync_with_stdio(false);

Angel Moreno · Answer 4 · 2020-04-04T06:22:35+08:00

You can iterate over the "bytes" of a string that is in UTF-8 and output those bytes elsewhere.

What you can never do is "interleave" characters/bytes (in this case end of line: the "endl") between those bytes that you are iterating, since there are characters that are made up of two bytes (the ñ, the á, etc) and are not "separable".

To better understand what I say above, this code works only for (unicode) characters less than 0x800 (less than 8*256, the 'ñ', 'á', are less than 1*256):

#include <iostream>
using namespace std;

int main()
{
  for (auto const&l : string("áaéeiíóúñ")) {
    cout << l;
    if ((l&0xc0)!=0xc0)
      cout << endl;
  }
}

Departure:

á
a
é
e
i
í
ó
ú
ñ

I have interleaved line returns only in "some cases" between the output "bytes".

Why doesn't cout show vowels with tildes or "ñ" with gcc 4.9.4?

HTML button that sends you to another page

Why do I get the error "Call to undefined function mysql_connect()"?

How to create an HTML button that works as a link?

How to separate a String in Java. How to use split()

Filter by dates in sql server

How to limit the number of decimal places in a double?

For each in JavaScript?

Position footer ALWAYS glued to the footer

Definitive Guide to Type Conversion in Java

How to properly compare Strings (and objects) in Java?