Discussion:
RFC: Add getcpu wrapper
(too old to reply)
H.J. Lu
2018-12-05 14:29:28 UTC
Permalink
To optimize for multi-node NUMA system, I need a very fast way to identity which
node the current process is running on. getcpu:

NAME
getcpu - determine CPU and NUMA node on which the calling thread is
running

SYNOPSIS
#include <linux/getcpu.h>

int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);

Note: There is no glibc wrapper for this system call; see NOTES.

DESCRIPTION
The getcpu() system call identifies the processor and node on which the
calling thread or process is currently running and writes them into the
integers pointed to by the cpu and node arguments. The processor is a
unique small integer identifying a CPU. The node is a unique small
identifier identifying a NUMA node. When either cpu or node is NULL
nothing is written to the respective pointer.

returns such info. But syscall () is too slow. I'd like to add a wrapper to
glibc. Any comments?

Thanks.
--
H.J.
Florian Weimer
2018-12-05 15:35:20 UTC
Permalink
Post by H.J. Lu
returns such info. But syscall () is too slow. I'd like to add a wrapper to
glibc. Any comments?
We already have it as sched_getcpu, I think.

Thanks,
Florian
H.J. Lu
2018-12-05 15:40:12 UTC
Permalink
Post by Florian Weimer
Post by H.J. Lu
returns such info. But syscall () is too slow. I'd like to add a wrapper to
glibc. Any comments?
We already have it as sched_getcpu, I think.
We have

int
sched_getcpu (void)
{
#ifdef __NR_getcpu
unsigned int cpu;
int r = INLINE_VSYSCALL (getcpu, 3, &cpu, NULL, NULL);

^^^^^ I need the node info.

return r == -1 ? r : cpu;
#else
__set_errno (ENOSYS);
return -1;
#endif
}
--
H.J.
Florian Weimer
2018-12-05 15:48:43 UTC
Permalink
Post by H.J. Lu
Post by Florian Weimer
Post by H.J. Lu
returns such info. But syscall () is too slow. I'd like to add a wrapper to
glibc. Any comments?
We already have it as sched_getcpu, I think.
We have
int
sched_getcpu (void)
{
#ifdef __NR_getcpu
unsigned int cpu;
int r = INLINE_VSYSCALL (getcpu, 3, &cpu, NULL, NULL);
^^^^^ I need the node info.
return r == -1 ? r : cpu;
#else
__set_errno (ENOSYS);
return -1;
#endif
}
Ah, so you want to pass a non-null second argument to getcpu?

I think we can expose getcpu (with two arguments), under that name. Add
it to <sched.h> or <unistd.h>; I do not have a strong preference.

Thanks,
Florian
Zack Weinberg
2018-12-05 16:00:17 UTC
Permalink
Post by Florian Weimer
Ah, so you want to pass a non-null second argument to getcpu?
I think we can expose getcpu (with two arguments), under that name. Add
it to <sched.h> or <unistd.h>; I do not have a strong preference.
What's the third argument for? Is there a concrete reason not to
expose it as well?

zw
H.J. Lu
2018-12-05 16:04:20 UTC
Permalink
Post by Zack Weinberg
Post by Florian Weimer
Ah, so you want to pass a non-null second argument to getcpu?
I think we can expose getcpu (with two arguments), under that name. Add
it to <sched.h> or <unistd.h>; I do not have a strong preference.
What's the third argument for? Is there a concrete reason not to
expose it as well?
zw
The third argument to this system call is nowadays unused, and should
be specified as NULL unless portability to Linux 2.6.23 or earlier is
required (see NOTES).
--
H.J.
Carlos O'Donell
2018-12-05 17:33:50 UTC
Permalink
Post by H.J. Lu
To optimize for multi-node NUMA system, I need a very fast way to identity which
NAME
getcpu - determine CPU and NUMA node on which the calling thread is
running
SYNOPSIS
#include <linux/getcpu.h>
int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);
Note: There is no glibc wrapper for this system call; see NOTES.
DESCRIPTION
The getcpu() system call identifies the processor and node on which the
calling thread or process is currently running and writes them into the
integers pointed to by the cpu and node arguments. The processor is a
unique small integer identifying a CPU. The node is a unique small
identifier identifying a NUMA node. When either cpu or node is NULL
nothing is written to the respective pointer.
returns such info. But syscall () is too slow. I'd like to add a wrapper to
glibc. Any comments?
I don't object to adding syscall wrappers, but your comment appears to indicate
that the glibc wrapper will be doing something more than just wrapping, what
do you have in mind?

On some architectures we might have vdso support for this, while on others it
will be a syscall. What's wrong with just using the fastest mechanism possible
and that's it?

I see that on x86 you have a vdso vgetcpu, and that lsl is one instruction and
loads the cpunode mask and ccpunode bits in one shot (atomic). So this should
work fine, but for all other callers I assume this will be a syscall.
--
Cheers,
Carlos.
H.J. Lu
2018-12-05 17:41:52 UTC
Permalink
Post by Carlos O'Donell
Post by H.J. Lu
To optimize for multi-node NUMA system, I need a very fast way to identity which
NAME
getcpu - determine CPU and NUMA node on which the calling thread is
running
SYNOPSIS
#include <linux/getcpu.h>
int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);
Note: There is no glibc wrapper for this system call; see NOTES.
DESCRIPTION
The getcpu() system call identifies the processor and node on which the
calling thread or process is currently running and writes them into the
integers pointed to by the cpu and node arguments. The processor is a
unique small integer identifying a CPU. The node is a unique small
identifier identifying a NUMA node. When either cpu or node is NULL
nothing is written to the respective pointer.
returns such info. But syscall () is too slow. I'd like to add a wrapper to
glibc. Any comments?
I don't object to adding syscall wrappers, but your comment appears to indicate
that the glibc wrapper will be doing something more than just wrapping, what
do you have in mind?
I am enclosing a patch to add getcpu. Testing on x86-64, x32 and i686
are done. I am running build-many-glibcs.py as we speak.
Post by Carlos O'Donell
On some architectures we might have vdso support for this, while on others it
will be a syscall. What's wrong with just using the fastest mechanism possible
and that's it?
I see that on x86 you have a vdso vgetcpu, and that lsl is one instruction and
loads the cpunode mask and ccpunode bits in one shot (atomic). So this should
work fine, but for all other callers I assume this will be a syscall.
I need vdso getcpu to avoid syscall. I am working on a NUMA spinlock
library which depends on a very fast getcpu.

--
H.J.
H.J. Lu
2018-12-05 18:13:24 UTC
Permalink
Post by H.J. Lu
Post by Carlos O'Donell
Post by H.J. Lu
To optimize for multi-node NUMA system, I need a very fast way to identity which
NAME
getcpu - determine CPU and NUMA node on which the calling thread is
running
SYNOPSIS
#include <linux/getcpu.h>
int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);
Note: There is no glibc wrapper for this system call; see NOTES.
DESCRIPTION
The getcpu() system call identifies the processor and node on which the
calling thread or process is currently running and writes them into the
integers pointed to by the cpu and node arguments. The processor is a
unique small integer identifying a CPU. The node is a unique small
identifier identifying a NUMA node. When either cpu or node is NULL
nothing is written to the respective pointer.
returns such info. But syscall () is too slow. I'd like to add a wrapper to
glibc. Any comments?
I don't object to adding syscall wrappers, but your comment appears to indicate
that the glibc wrapper will be doing something more than just wrapping, what
do you have in mind?
I am enclosing a patch to add getcpu. Testing on x86-64, x32 and i686
are done. I am running build-many-glibcs.py as we speak.
build-many-glibcs.py passed. OK for trunk?

https://sourceware.org/ml/libc-alpha/2018-12/msg00161.html
--
H.J.
Florian Weimer
2018-12-05 18:48:10 UTC
Permalink
+extern int __getcpu (unsigned *, unsigned *) __THROW;
“unsigned int”; perhaps use __typeof__ (getcpu)?
diff --git a/sysdeps/unix/sysv/linux/bits/sched.h b/sysdeps/unix/sysv/linux/bits/sched.h
index 34f27a7d9b..ea5d51a80d 100644
--- a/sysdeps/unix/sysv/linux/bits/sched.h
+++ b/sysdeps/unix/sysv/linux/bits/sched.h
@@ -86,6 +86,9 @@ extern int unshare (int __flags) __THROW;
/* Get index of currently used CPU. */
extern int sched_getcpu (void) __THROW;
+/* Get currently used CPU and NUMA node. */
+extern int getcpu (unsigned int *, unsigned int *) __THROW;
Can either pointer be NULL?
+__getcpu (unsigned *cpu, unsigned *node)
“unsigned int”.
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-affinity-getcpu.c
@@ -0,0 +1,292 @@
+/* Test case for CPU affinity functions with getcpu.
+ Copyright (C) 2018 Free Software Foundation, Inc.
I think this is a copy of another test, right?

Maybe we should move bits of it to support/?

We have a parallel discussion about a new requirement that all new
functions must have documentation in the manual. I don't know what will
come out of that.

Thanks,
Florian
H.J. Lu
2018-12-05 19:50:34 UTC
Permalink
+extern int __getcpu (unsigned *, unsigned *) __THROW;
“unsigned int”; perhaps use __typeof__ (getcpu)?
Fixed.
diff --git a/sysdeps/unix/sysv/linux/bits/sched.h b/sysdeps/unix/sysv/linux/bits/sched.h
index 34f27a7d9b..ea5d51a80d 100644
--- a/sysdeps/unix/sysv/linux/bits/sched.h
+++ b/sysdeps/unix/sysv/linux/bits/sched.h
@@ -86,6 +86,9 @@ extern int unshare (int __flags) __THROW;
/* Get index of currently used CPU. */
extern int sched_getcpu (void) __THROW;
+/* Get currently used CPU and NUMA node. */
+extern int getcpu (unsigned int *, unsigned int *) __THROW;
Can either pointer be NULL?
Yes. Either pointer can be NULL.
+__getcpu (unsigned *cpu, unsigned *node)
“unsigned int”.
Fixed.
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/tst-affinity-getcpu.c
@@ -0,0 +1,292 @@
+/* Test case for CPU affinity functions with getcpu.
+ Copyright (C) 2018 Free Software Foundation, Inc.
I think this is a copy of another test, right?
Maybe we should move bits of it to support/?
I updated the patch with a very simple test.
We have a parallel discussion about a new requirement that all new
functions must have documentation in the manual. I don't know what will
come out of that.
Here is the updated patch.
--
H.J.
Florian Weimer
2018-12-05 19:57:37 UTC
Permalink
Post by H.J. Lu
I updated the patch with a very simple test.
The simple test has a race condition unfortunately.

I think we could fold the test into the existing affinity tests.

The main question is whether getcpu is fast enough. It does not perform
a system call, but in my tests a while back, it was still rather slow.

The new rseq-based getcpu does not support NUMA nodes, it seems. Maybe
it's still possible to fix that.

Thanks,
Florian
H.J. Lu
2018-12-05 20:07:05 UTC
Permalink
Post by Florian Weimer
Post by H.J. Lu
I updated the patch with a very simple test.
The simple test has a race condition unfortunately.
I think we could fold the test into the existing affinity tests.
Sure, I will do that.
Post by Florian Weimer
The main question is whether getcpu is fast enough. It does not perform
a system call, but in my tests a while back, it was still rather slow.
It is fast enough for NUMA spinlock.
Post by Florian Weimer
The new rseq-based getcpu does not support NUMA nodes, it seems. Maybe
it's still possible to fix that.
That will be great. We can take advantage it when it is available.
--
H.J.
H.J. Lu
2018-12-05 20:14:11 UTC
Permalink
Post by H.J. Lu
Post by Florian Weimer
Post by H.J. Lu
I updated the patch with a very simple test.
The simple test has a race condition unfortunately.
I think we could fold the test into the existing affinity tests.
Sure, I will do that.
Like this.
Post by H.J. Lu
Post by Florian Weimer
The main question is whether getcpu is fast enough. It does not perform
a system call, but in my tests a while back, it was still rather slow.
It is fast enough for NUMA spinlock.
Post by Florian Weimer
The new rseq-based getcpu does not support NUMA nodes, it seems. Maybe
it's still possible to fix that.
That will be great. We can take advantage it when it is available.
--
H.J.
--
H.J.
Tulio Magno Quites Machado Filho
2018-12-07 12:50:53 UTC
Permalink
* NEWS: Mention getcpu.
* include/sched.h (__getcpu): New libc_hidden_proto.
* sysdeps/unix/sysv/linux/Makefile (sysdep_routines): Add getcpu.
* sysdeps/unix/sysv/linux/Versions (GLIBC_2.29): Add getcpu.
* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Add getcpu.
* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist: Likewise.
Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libc.abilist: Likewise.
Likewise.
Likewise.
Likewise.
I noticed the powerpc64le abilist file is missing in the ChangeLog.
NEWS | 3 ++
include/sched.h | 2 +
sysdeps/unix/sysv/linux/Makefile | 2 +-
sysdeps/unix/sysv/linux/Versions | 3 ++
sysdeps/unix/sysv/linux/aarch64/libc.abilist | 1 +
sysdeps/unix/sysv/linux/alpha/libc.abilist | 1 +
sysdeps/unix/sysv/linux/arm/libc.abilist | 1 +
sysdeps/unix/sysv/linux/bits/sched.h | 3 ++
sysdeps/unix/sysv/linux/getcpu.c | 38 +++++++++++++++++++
sysdeps/unix/sysv/linux/hppa/libc.abilist | 1 +
sysdeps/unix/sysv/linux/i386/libc.abilist | 1 +
sysdeps/unix/sysv/linux/ia64/libc.abilist | 1 +
.../sysv/linux/m68k/coldfire/libc.abilist | 1 +
.../unix/sysv/linux/m68k/m680x0/libc.abilist | 1 +
.../unix/sysv/linux/microblaze/libc.abilist | 1 +
.../sysv/linux/mips/mips32/fpu/libc.abilist | 1 +
.../sysv/linux/mips/mips32/nofpu/libc.abilist | 1 +
.../sysv/linux/mips/mips64/n32/libc.abilist | 1 +
.../sysv/linux/mips/mips64/n64/libc.abilist | 1 +
sysdeps/unix/sysv/linux/nios2/libc.abilist | 1 +
.../linux/powerpc/powerpc32/fpu/libc.abilist | 1 +
.../powerpc/powerpc32/nofpu/libc.abilist | 1 +
.../linux/powerpc/powerpc64/libc-le.abilist | 1 +
While you're making the correct changes. So, just a ChangeLog entry missing.

This patch will conflict with an earlier patch from Andreas Schwab:
https://sourceware.org/ml/libc-alpha/2018-11/msg00752.html
--
Tulio Magno
H.J. Lu
2018-12-07 14:49:00 UTC
Permalink
On Fri, Dec 7, 2018 at 4:51 AM Tulio Magno Quites Machado Filho
Post by Tulio Magno Quites Machado Filho
* NEWS: Mention getcpu.
* include/sched.h (__getcpu): New libc_hidden_proto.
* sysdeps/unix/sysv/linux/Makefile (sysdep_routines): Add getcpu.
* sysdeps/unix/sysv/linux/Versions (GLIBC_2.29): Add getcpu.
* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Add getcpu.
* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist: Likewise.
Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libc.abilist: Likewise.
Likewise.
Likewise.
Likewise.
I noticed the powerpc64le abilist file is missing in the ChangeLog.
Fixed.
Post by Tulio Magno Quites Machado Filho
NEWS | 3 ++
include/sched.h | 2 +
sysdeps/unix/sysv/linux/Makefile | 2 +-
sysdeps/unix/sysv/linux/Versions | 3 ++
sysdeps/unix/sysv/linux/aarch64/libc.abilist | 1 +
sysdeps/unix/sysv/linux/alpha/libc.abilist | 1 +
sysdeps/unix/sysv/linux/arm/libc.abilist | 1 +
sysdeps/unix/sysv/linux/bits/sched.h | 3 ++
sysdeps/unix/sysv/linux/getcpu.c | 38 +++++++++++++++++++
sysdeps/unix/sysv/linux/hppa/libc.abilist | 1 +
sysdeps/unix/sysv/linux/i386/libc.abilist | 1 +
sysdeps/unix/sysv/linux/ia64/libc.abilist | 1 +
.../sysv/linux/m68k/coldfire/libc.abilist | 1 +
.../unix/sysv/linux/m68k/m680x0/libc.abilist | 1 +
.../unix/sysv/linux/microblaze/libc.abilist | 1 +
.../sysv/linux/mips/mips32/fpu/libc.abilist | 1 +
.../sysv/linux/mips/mips32/nofpu/libc.abilist | 1 +
.../sysv/linux/mips/mips64/n32/libc.abilist | 1 +
.../sysv/linux/mips/mips64/n64/libc.abilist | 1 +
sysdeps/unix/sysv/linux/nios2/libc.abilist | 1 +
.../linux/powerpc/powerpc32/fpu/libc.abilist | 1 +
.../powerpc/powerpc32/nofpu/libc.abilist | 1 +
.../linux/powerpc/powerpc64/libc-le.abilist | 1 +
While you're making the correct changes. So, just a ChangeLog entry missing.
https://sourceware.org/ml/libc-alpha/2018-11/msg00752.html
One of us has to resolve the conflict.
--
H.J.
Joseph Myers
2018-12-05 20:43:00 UTC
Permalink
Post by Florian Weimer
We have a parallel discussion about a new requirement that all new
functions must have documentation in the manual. I don't know what will
come out of that.
It's not a new requirement at all. Documenting new interfaces was added
to <https://sourceware.org/glibc/wiki/Contribution%20checklist> on
2012-04-17.

There are questions about the details of when some function might be
excluded from being documented (such as if it's part of a family of
functions none of which are documented yet), and of how much documentation
should be included for a Linux-specific function whose main semantics
*are* to pass its arguments through to a Linux kernel syscall (but even
there you should document the argument and return types, error and
cancellation handling, what header declares the function under what
feature test macros - everything that is not part of the syscall interface
- even if one might link to external freely-licensed documentation of
exactly what the syscall does). But the basic principle that new
functions should be documented, in the absence of an explicitly stated
reason that documenting a given function is problematic, seems clear
enough - just like the principle that a new function should have test
coverage (in the absence of stated justification of being hard to test),
and that a new function needs a reason to be architecture-specific, for
example.

In this case, the proposed function has a prototype *different* from that
for the syscall, which makes it especially clear that documentation is
needed.

(The NEWS entry also needs to state that this function is Linux-specific.)
--
Joseph S. Myers
***@codesourcery.com
Zack Weinberg
2018-12-05 20:56:29 UTC
Permalink
Post by Joseph Myers
Post by Florian Weimer
We have a parallel discussion about a new requirement that all new
functions must have documentation in the manual.
It's not a new requirement at all. Documenting new interfaces was added
to <https://sourceware.org/glibc/wiki/Contribution%20checklist> on
2012-04-17.
I get the impression that Florian either believes that the existing
manual is no longer useful, or has some kind of principled objection
to adding new material to it. Either would be reason to open a
new_discussion about changing this rule.

Florian, would you mind, at your convenience, explaining why you do
not want to add new material to the manual?

zw
Joseph Myers
2018-12-05 21:28:55 UTC
Permalink
Post by Zack Weinberg
Post by Joseph Myers
Post by Florian Weimer
We have a parallel discussion about a new requirement that all new
functions must have documentation in the manual.
It's not a new requirement at all. Documenting new interfaces was added
to <https://sourceware.org/glibc/wiki/Contribution%20checklist> on
2012-04-17.
I get the impression that Florian either believes that the existing
manual is no longer useful, or has some kind of principled objection
to adding new material to it. Either would be reason to open a
new_discussion about changing this rule.
Florian, would you mind, at your convenience, explaining why you do
not want to add new material to the manual?
I would also suggest that people reread the previous discussion
<https://sourceware.org/ml/libc-alpha/2012-04/threads.html#00315>
(especially Roland's comments
<https://sourceware.org/ml/libc-alpha/2012-04/msg00988.html>).

I see then I was listing possible exceptions to requiring documentation
for functions from external standards and direct wrappers round system
calls <https://sourceware.org/ml/libc-alpha/2012-04/msg00724.html>. I
don't now think such exceptions are generally a good idea, but there may
be cases where we can't usefully say much about semantics beyond that a
function calls the Linux system call of the same name (plus the
information about things that aren't part of the syscall interface) - say
if we add a wrapper for the bpf syscall (bug 18271).

For the bulk of new functions added since that discussion coming from
external standards - for example, the C11 threads functions and the TS
18661 floating-point functions - documentation *has* been added to the
glibc manual (and in many cases not to Linux man-pages).
--
Joseph S. Myers
***@codesourcery.com
Florian Weimer
2018-12-07 13:11:28 UTC
Permalink
Post by Zack Weinberg
I get the impression that Florian either believes that the existing
manual is no longer useful, or has some kind of principled objection
to adding new material to it. Either would be reason to open a
new_discussion about changing this rule.
Florian, would you mind, at your convenience, explaining why you do
not want to add new material to the manual?
First of all, from my perspective, it's not about Texinfo at all. I
actually prefer Texinfo to quite a few of the documentation markup
languages out there.

Visibility of the manual is poor compared to the man-pages project.
Academic citations are far fewer than for man7.org. I don't know why
this is.

Debian only ships a very old version of the manual, in the non-free
section, so Debian users have to rely on the web version.

I recently discovered ‘C-h S sprintf RET’ in Emacs, which takes you
directly to the documentation of the printf function, similar to ‘M-x
man RET sprintf RET’. But the manual is structured poorly to support
this kind of usage. For example, the subsection about sprintf does not
give you a link to the format strings accepted by sprintf.

All in all, it makes me feel that the manual is for a really small
audience, which makes work on it not very satisfying. I think we should
rather contribute to the man7.org manual pages and make sure that the
information found there is up to our standards.

And of course, the GFDL is problematic (it's the reason why Debian
doesn't ship the manual, and why we can't add Javascript code to improve
the web version, similar to what OpenJDK did), and so is the
U.S. regulatory notice. I feel someone needs to make a stand here,
otherwise things will never change.

I had a lengthy chat about this with Carlos yesterday, and my position
has mellowed somewhat. But I still find the situation frustrating.

Thanks,
Florian
H.J. Lu
2018-12-05 21:39:45 UTC
Permalink
Post by Joseph Myers
Post by Florian Weimer
We have a parallel discussion about a new requirement that all new
functions must have documentation in the manual. I don't know what will
come out of that.
It's not a new requirement at all. Documenting new interfaces was added
to <https://sourceware.org/glibc/wiki/Contribution%20checklist> on
2012-04-17.
There are questions about the details of when some function might be
excluded from being documented (such as if it's part of a family of
functions none of which are documented yet), and of how much documentation
should be included for a Linux-specific function whose main semantics
*are* to pass its arguments through to a Linux kernel syscall (but even
there you should document the argument and return types, error and
cancellation handling, what header declares the function under what
feature test macros - everything that is not part of the syscall interface
- even if one might link to external freely-licensed documentation of
exactly what the syscall does). But the basic principle that new
functions should be documented, in the absence of an explicitly stated
reason that documenting a given function is problematic, seems clear
enough - just like the principle that a new function should have test
coverage (in the absence of stated justification of being hard to test),
and that a new function needs a reason to be architecture-specific, for
example.
In this case, the proposed function has a prototype *different* from that
for the syscall, which makes it especially clear that documentation is
needed.
I added an entry to manual/process.texi.
Post by Joseph Myers
(The NEWS entry also needs to state that this function is Linux-specific.)
Fixed.

Here is the updated patch.
--
H.J.
Joseph Myers
2018-12-05 21:45:27 UTC
Permalink
Post by H.J. Lu
I added an entry to manual/process.texi.
Thanks. Note that @standards{} does not currently expand to any text in
the manual; it only generates entries in the library summary. So for now
function descriptions in the manual need to say explicitly what standards
they come from (i.e., that the function is Linux-specific, in this case)
and what headers declare the functions. (In many cases, one or both of
those pieces of information may well be stated in the relevant section of
the manual as applying to all the functions in it, rather than needing to
be repeated in each individual function description.)
--
Joseph S. Myers
***@codesourcery.com
Florian Weimer
2018-12-05 21:55:34 UTC
Permalink
I'm not sure if this is the right place. This is specifically not a
process attribute, after all. Perhaps it will fit better into the
discussion of scheduling-related functionality?
Can we unwind through vsyscalls? If not, the function is not AC-safe.
+the calling thread or process is currently running and writes them into
+processor is a unique small integer identifying a CPU. The node is a
“small nonnegative integer”
“small nonnegative integer” (not “identifier”)
+pointer.
Should we add a caveat that due to CPU hotplugging, the values might not
be so small after a while?
+
+Arguments point outside the calling process's address space.
I don't think the vsyscall can detect the EFAULT error condition
reliably. Should we really document it?

What about ENOSYS? Is this system call really supported on all
architectures?

Thanks,
Florian
H.J. Lu
2018-12-06 20:26:36 UTC
Permalink
Post by Florian Weimer
I'm not sure if this is the right place. This is specifically not a
process attribute, after all. Perhaps it will fit better into the
discussion of scheduling-related functionality?
I moved it to manual/resource.texi.
Post by Florian Weimer
Can we unwind through vsyscalls? If not, the function is not AC-safe.
Yes, see:

https://bugzilla.kernel.org/show_bug.cgi?id=201741
Post by Florian Weimer
+the calling thread or process is currently running and writes them into
+processor is a unique small integer identifying a CPU. The node is a
“small nonnegative integer”
I changed it to "nonnegative integer” .
Post by Florian Weimer
“small nonnegative integer” (not “identifier”)
I changed it to "nonnegative integer” .
Post by Florian Weimer
+pointer.
Should we add a caveat that due to CPU hotplugging, the values might not
be so small after a while?
I removed "small".
Post by Florian Weimer
+
+Arguments point outside the calling process's address space.
I don't think the vsyscall can detect the EFAULT error condition
reliably. Should we really document it?
Removed.
Post by Florian Weimer
What about ENOSYS? Is this system call really supported on all
architectures?
Added.

OK for master?

Thanks.
--
H.J.
Florian Weimer
2018-12-07 16:51:40 UTC
Permalink
+* The getcpu wrapper function has been added, which returns currently
+ used CPU and NUMA node. This function is Linux-specific.
“returns the currently used” (not sure)?
diff --git a/sysdeps/unix/sysv/linux/tst-skeleton-affinity.c b/sysdeps/unix/sysv/linux/tst-skeleton-affinity.c
index 695c1ccdbd..fd1357beb1 100644
--- a/sysdeps/unix/sysv/linux/tst-skeleton-affinity.c
+++ b/sysdeps/unix/sysv/linux/tst-skeleton-affinity.c
@@ -165,6 +165,18 @@ test_size (const struct conf *conf, size_t size)
for (int cpu = 0; cpu <= conf->last_cpu; ++cpu)
{
int active_cpu = sched_getcpu ();
+ unsigned int numa_cpu, numa_node;
+ if (getcpu (&numa_cpu, &numa_node))
!= NULL
+ {
+ printf ("error: getcpu: %m\n");
+ return false;
Okay, ENOSYS is a hard error here because we have checked for a working
sched_getcpu.
+ }
+ if ((unsigned int) active_cpu != numa_cpu)
+ {
+ printf ("error: Unexpected CPU %d, expected %d\n",
+ active_cpu, numa_cpu);
+ return false;
+ }
if (last_active_cpu >= 0 && last_active_cpu != active_cpu)
I think you need to move the getcpu call and this check down further,
after the setaffinity call with a single-element CPU set.

Thanks,
Florian
H.J. Lu
2018-12-07 17:01:40 UTC
Permalink
+* The getcpu wrapper function has been added, which returns currently
+ used CPU and NUMA node. This function is Linux-specific.
“returns the currently used” (not sure)?
Fixed.
diff --git a/sysdeps/unix/sysv/linux/tst-skeleton-affinity.c b/sysdeps/unix/sysv/linux/tst-skeleton-affinity.c
index 695c1ccdbd..fd1357beb1 100644
--- a/sysdeps/unix/sysv/linux/tst-skeleton-affinity.c
+++ b/sysdeps/unix/sysv/linux/tst-skeleton-affinity.c
@@ -165,6 +165,18 @@ test_size (const struct conf *conf, size_t size)
for (int cpu = 0; cpu <= conf->last_cpu; ++cpu)
{
int active_cpu = sched_getcpu ();
+ unsigned int numa_cpu, numa_node;
+ if (getcpu (&numa_cpu, &numa_node))
!= NULL
I changed to

if (getcpu (&numa_cpu, &numa_node) != 0)

since getcpu returns int.
+ {
+ printf ("error: getcpu: %m\n");
+ return false;
Okay, ENOSYS is a hard error here because we have checked for a working
sched_getcpu.
+ }
+ if ((unsigned int) active_cpu != numa_cpu)
+ {
+ printf ("error: Unexpected CPU %d, expected %d\n",
+ active_cpu, numa_cpu);
+ return false;
+ }
if (last_active_cpu >= 0 && last_active_cpu != active_cpu)
I think you need to move the getcpu call and this check down further,
after the setaffinity call with a single-element CPU set.
Done.

Here is the updated patch. OK for master?

Thanks.
--
H.J.
Florian Weimer
2018-12-07 17:15:41 UTC
Permalink
Post by H.J. Lu
Here is the updated patch. OK for master?
Looks okay to me now.

Thanks,
Florian
H.J. Lu
2018-12-07 20:29:50 UTC
Permalink
Post by Florian Weimer
Post by H.J. Lu
Here is the updated patch. OK for master?
Looks okay to me now.
This is needed for Hurd.

Tested with build-many-glibcs.py. OK for master?
--
H.J.
Samuel Thibault
2018-12-07 20:38:04 UTC
Permalink
Post by H.J. Lu
Post by Florian Weimer
Post by H.J. Lu
Here is the updated patch. OK for master?
Looks okay to me now.
This is needed for Hurd.
Tested with build-many-glibcs.py. OK for master?
I was testing the same patch :)

Please commit :)

Samuel
H.J. Lu
2018-12-07 20:49:07 UTC
Permalink
On Fri, Dec 7, 2018 at 12:38 PM Samuel Thibault
Post by Samuel Thibault
Post by H.J. Lu
Post by Florian Weimer
Post by H.J. Lu
Here is the updated patch. OK for master?
Looks okay to me now.
This is needed for Hurd.
Tested with build-many-glibcs.py. OK for master?
I was testing the same patch :)
Please commit :)
Done.

Thanks.
--
H.J.
Szabolcs Nagy
2018-12-10 12:20:11 UTC
Permalink
Post by H.J. Lu
Post by Carlos O'Donell
Post by H.J. Lu
To optimize for multi-node NUMA system, I need a very fast way to identity which
NAME
getcpu - determine CPU and NUMA node on which the calling thread is
running
...
Post by H.J. Lu
Post by Carlos O'Donell
I see that on x86 you have a vdso vgetcpu, and that lsl is one instruction and
loads the cpunode mask and ccpunode bits in one shot (atomic). So this should
work fine, but for all other callers I assume this will be a syscall.
I need vdso getcpu to avoid syscall. I am working on a NUMA spinlock
library which depends on a very fast getcpu.
hm i thought rseq will let you access cpuid faster than vdso.
(and that can even protect against preemptive scheduling.
i don't know much about numa, but i'd expect the cpuid to
num
Florian Weimer
2018-12-10 12:22:06 UTC
Permalink
Post by Szabolcs Nagy
Post by H.J. Lu
Post by Carlos O'Donell
Post by H.J. Lu
To optimize for multi-node NUMA system, I need a very fast way to identity which
NAME
getcpu - determine CPU and NUMA node on which the calling thread is
running
...
Post by H.J. Lu
Post by Carlos O'Donell
I see that on x86 you have a vdso vgetcpu, and that lsl is one instruction and
loads the cpunode mask and ccpunode bits in one shot (atomic). So this should
work fine, but for all other callers I assume this will be a syscall.
I need vdso getcpu to avoid syscall. I am working on a NUMA spinlock
library which depends on a very fast getcpu.
hm i thought rseq will let you access cpuid faster than vdso.
(and that can even protect against preemptive scheduling.
i don't know much about numa, but i'd expect the cpuid to
numa node mapping to be fixed and then cpuid is enough)
The mapping is not fixed, due to CPU hotplug/unplug, VM migration and
CRIU.

Thanks,
Florian

Loading...