Opened 5 years ago

Last modified 5 years ago

#19337 needs_work enhancement

Improve asteroidal triples code

Reported by: dcoudert Owned by:
Priority: major Milestone: sage-6.9
Component: graph theory Keywords:
Cc: ncohen Merged in:
Authors: David Coudert Reviewers:
Report Upstream: N/A Work issues:
Branch: u/dcoudert/asteroid (Commits) Commit: e32015d13649c7df56362fc9fd2b57937f465705
Dependencies: #19334 Stopgaps:

Description (last modified by dcoudert)

Fix some types problems raised in #19334 for asteroidal_triples.pyx. In particular, we get rid of uint32_t whenever possible.

Change History (27)

comment:1 Changed 5 years ago by dcoudert

  • Branch set to u/dcoudert/asteroid

comment:2 Changed 5 years ago by git

  • Commit set to aa2f9b32409dcad26fdcf75f69cd686b7952f513

Branch pushed to git repo; I updated commit sha1. New commits:

aa2f9b3trac #19337: better comments

comment:3 Changed 5 years ago by dcoudert

  • Cc ncohen added
  • Description modified (diff)
  • Status changed from new to needs_review

Using int should be enough for the size of graphs we can process.

comment:4 Changed 5 years ago by jdemeyer

  • Dependencies #19334 deleted

comment:5 follow-up: Changed 5 years ago by ncohen

Err... int are usually larget than 32bits ?...

comment:6 in reply to: ↑ 5 ; follow-up: Changed 5 years ago by jdemeyer

Replying to ncohen:

Err... int are usually larget than 32bits ?...

In fact, no. On all systems I know, ints are exactly 32 bits.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 5 years ago by ncohen

In fact, no. On all systems I know, ints are exactly 32 bits.

Funny. I had somewhere in my head that int and long were the same size.

Anyway, why should we downgrade those uint32 to int exactly? O_o

Nathann

comment:8 Changed 5 years ago by ncohen

It seems from the comments on the other ticket that you would only need to replace "-1" with "-1L"?..

comment:9 in reply to: ↑ 7 Changed 5 years ago by jdemeyer

Replying to ncohen:

In fact, no. On all systems I know, ints are exactly 32 bits.

Funny. I had somewhere in my head that int and long were the same size.

on 32-bit systems that's true :-)

Note that the C standard doesn't say much about this. On Windows for example, long is always 32-bit (even on 64-bit systems). There might be systems where int is 64-bit.

comment:10 Changed 5 years ago by vbraun

  • Dependencies set to #19334

comment:11 follow-up: Changed 5 years ago by vbraun

The only guarantee by the C spec is that int is signed and at least 16 bit.

I'd recommend the C99 int_fast32_t type which will be guaranteed to be at least 32 bit and the fastest on the target system. IMHO you should never use the old int / long except to match legacy libraries, its bad style and very hard to write 100% standards-compliant code.

comment:12 Changed 5 years ago by git

  • Commit changed from aa2f9b32409dcad26fdcf75f69cd686b7952f513 to e32015d13649c7df56362fc9fd2b57937f465705

Branch pushed to git repo; I updated commit sha1. New commits:

92b960btrac #19337: Merged with 6.9.rc2
e32015dtrac #19337: try with int_fast32_t

comment:13 Changed 5 years ago by dcoudert

So let's try with int_fast32_t if it is better.

comment:14 in reply to: ↑ 11 ; follow-ups: Changed 5 years ago by jdemeyer

Replying to vbraun:

I'd recommend the C99 int_fast32_t type

-1.

Why should this graph code use a type of at least 32 bits? Where does the arbitrary number 32 come from? Why not int_fast16_t or int_fast64_t then?

If you don't want int, I'd go with long then which is also guaranteed to be at least 32 bits and which matches the actual return type of bitset_first_in_complement().

you should never use the old int / long except to match legacy libraries, its bad style and very hard to write 100% standards-compliant code.

Really, why that? I would say that you should always use types like long unless you need a specific bit-length.

comment:15 in reply to: ↑ 14 Changed 5 years ago by ncohen

Why should this graph code use a type of at least 32 bits? Where does the arbitrary number 32 come from? Why not int_fast16_t or int_fast64_t then?

I did not check this specific code but uint32_t probably comes from the data structure "static_sparse_graph", which uses them.

Then, as David said, this algorithm does not need uint32_t -- it would not run. The 31 bits of int32_t would do, and 16 bits would probably be sufficient.

Really, why that? I would say that you should always use types like long unless you need a specific bit-length.

Here we want to specify the bit length, because the smallest the type is the better it will be cached. We will read memory very very often in that code.

Nathann

comment:16 follow-up: Changed 5 years ago by jdemeyer

About the "fastest on the target system": that very much depends on the application. On 64-bit systems, it could easily be a 64-bit type. If you're making arrays of those (it seems that you do), you can end up slower than int32_t because of the larger memory accesses.

PS: I realize that we're bikeshedding here. I'm not really against int_fast32_t, I just wouldn't recommend it in this case. I would advice: if you're certain(!) that 32 bits is enough, use int32_t. Otherwise, use long.

comment:17 Changed 5 years ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:18 in reply to: ↑ 16 Changed 5 years ago by ncohen

About the "fastest on the target system": that very much depends on the application. On 64-bit systems, it could easily be a 64-bit type. If you're making arrays of those (it seems that you do), you can end up slower than int32_t because of the larger memory accesses.

Yes yes, I agree.

PS: I realize that we're bikeshedding here. I'm not really against int_fast32_t, I just wouldn't recommend it in this case. I would advice: if you're certain(!) that 32 bits is enough, use int32_t. Otherwise, use long.

Sooo... Would you say that there is actually *anything* that needs be changed in this file? We use fixed-size types, they are unsigned because that's what the data structure expects, and the change in Cython has been fixed by Volker with a cast.

Nathann

comment:19 Changed 5 years ago by dcoudert

Hello,

the algorithm has time complexity O(n^3) and space complexity O(n^2). So we will certainly not use it on graphs with 100,000 nodes (I can't on my laptop).

I was using uint32_t since static_sparse_graph uses it, but I agree that long is more appropriate with bitset_first_in_complement(). So I can use long for all indexes, or a more appropriate type if any. For the arrays, all values stored in that array are in range 1..n. So either I use a 16 bits type and add a test at the beginning of the method in case n needs more than that, or I use any type with at least 32 bits. Well, uint32_t is good for that.

Let me know.

David.

comment:20 in reply to: ↑ 14 ; follow-up: Changed 5 years ago by vbraun

If you need only 16 bits then there is of course int_fast16_t. Its definition is analogous to plain int except that the latter has no promise of being a particularly fast choice. I don't care about the actual bitwidth, it should just be reflected (and easily readable) in the code.

Replying to jdemeyer:

If you don't want int, I'd go with long then which is also guaranteed to be at least 32 bits

But not guaranteed to be particularly fast.

and which matches the actual return type of bitset_first_in_complement().

I.e. the "legacy libraries" case.

you should never use the old int / long except to match legacy libraries, its bad style and very hard to write 100% standards-compliant code.

Really, why that? I would say that you should always use types like long unless you need a specific bit-length.

But by using long you are already asking for a particular bit width, except that it does not document the intent (how many Sage contributors know the specs for long?). Usually, you see long because

  • the author mistakenly believes that it is the longest int type
  • the author mistakenly believes that it can hold a pointer
  • the author mistakenly believes that it is at least 64-bit
  • the author mistakenly believes that it is 64-bit on a 64-bit machine
  • cargo cult, i.e., to match legacy code

With the C99 data types you just need to answer

  • what is the minimum bit-width required
  • do I need to optimize for space or speed

and the resulting program inherently documents it. Its so much better, a total slam-dunk.

Last edited 5 years ago by vbraun (previous) (diff)

comment:21 in reply to: ↑ 20 ; follow-ups: Changed 5 years ago by jdemeyer

Replying to vbraun:

Usually, you see long because

  • the author mistakenly believes that it is the longest int type
  • the author mistakenly believes that it can hold a pointer
  • the author mistakenly believes that it is at least 64-bit
  • the author mistakenly believes that it is 64-bit on a 64-bit machine
  • cargo cult, i.e., to match legacy code

I can play that game too:

Usually, you see int_fastN_t because

  • the author mistakenly believes that his code will never need more than N bits (think "640K should be enough for everybody")
  • the author mistakenly believes that int_fastN_t will yield the fastest possible code (not true if the code is memory-bound)

what is the minimum bit-width required

I don't think that one should always answer this question. There are valid use cases for types which have different sizes on 32-bit and 64-bit systems.

comment:22 in reply to: ↑ 21 Changed 5 years ago by ncohen

I can play that game too:

Go play it somewhere else. This thing is a trac ticket. Review it or exchange private emails.

Nathann

comment:23 in reply to: ↑ 21 ; follow-up: Changed 5 years ago by vbraun

Replying to jdemeyer:

what is the minimum bit-width required

I don't think that one should always answer this question. There are valid use cases for types which have different sizes on 32-bit and 64-bit systems.

I agree that there are valid use cases, but

  • int/long are also terrible for that; They encode mostly encode historical accidents, e.g. long is different on 64-bit linux vs 64-bit windows on the same hardware
  • in Mathematics, correctness and reproducibility (including on different OS/hardware) are much more important than elsewhere. If you can't trust that the result is correct then neither speed nor memory usage matters.

comment:24 in reply to: ↑ 23 Changed 5 years ago by ncohen

  • in Mathematics, correctness and reproducibility (including on different OS/hardware) are much more important than elsewhere. If you can't trust that the result is correct then neither speed nor memory usage matters.

This, right after you decided to not include a stopgap in a code that returns wrong answers? That made me laugh.

comment:25 Changed 5 years ago by dcoudert

Today, with sage 6.9.rc3 and without this patch, I have a segfault!

With this patch it's working.

I don't know which is the best type to use and above discussion confuses me.

What I do know is that this patch solves a segfault and so is needed.

Best, David.

comment:26 Changed 5 years ago by ncohen

Do you know if the segfault is deterministic? Does it happen consistently or may it be related to something else? I do not see how changing the type may fix anything that is not already fixed by #19334.

Nathann

comment:27 Changed 5 years ago by dcoudert

sorry, I thought that #19334 was already merged. So, with #19334 I don't have segfault. I hope it will be merged shortly.

Note: See TracTickets for help on using tickets.