-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw collector not working #218
Comments
For mimic onwards we've switched from collectd to the embedded prometheus exporter in ceph-mgr. All perf stats are sent to the mgr anyway, so switching to this approach eliminates a lot of 'moving parts'. However, if you're happy with collectd we should be able to help too. Starting from the top, the collector probes the host to look for the radosgw binary, and if found enables stats gathering. So in your collectd log on your rgw host are you seeing "Roles detected .... rgw: True"? By default the collector should be running in debug mode, which will give us some more info in /var/log/collectd-cephmetrics.log - could you upload that to pastebin and drop the link in here? Another common problem with access is selinux - is it possible that selinux is blocking? |
Thank you for your pormpt response
Role is detected but socket not found
Interesting part is that this server is OSD and RGW only ...not MON so,
maybe, the detection is not working as expected ??/
osd01.chi.medavail.net collectd[4959]: cephmetrics: Roles detected -
mon:True osd:True rgw:True iscsi:False
2018-08-16 08:01:25,086 - DEBUG - [osd.py:132:_fetch_osd_stats() -
fetching osd stats for osd 0
2018-08-16 08:01:25,101 - DEBUG - [base.py:62:_admin_socket() -
admin_socket call 'perf dump' : 0.015s
2018-08-16 08:01:25,101 - DEBUG - [osd.py:312:_stats_lookup() - OSD perf
dump stats collected for 1 OSDs in 0.016s
2018-08-16 08:01:25,101 - INFO - [osd.py:353:get_stats() - osd get_stats
call : 0.016s
2018-08-16 08:01:35,085 - DEBUG - [base.py:62:_admin_socket() -
admin_socket call 'perf dump' : 0.000s
2018-08-16 08:01:35,085 - WARNING - [mon.py:619:get_stats() - MON socket is
not available...is ceph-mon active?
2018-08-16 08:01:35,085 - INFO - [mon.py:624:get_stats() - mon get_stats
call : 0.000s
2018-08-16 08:01:35,085 - WARNING - [rgw.py:95:get_stats() - RGW socket not
available...radosgw running?
2018-08-16 08:01:35,085 - INFO - [rgw.py:101:get_stats() - RGW get_stats
: 0.000s
…On Wed, 15 Aug 2018 at 17:46, pcuzner ***@***.***> wrote:
For mimic onwards we've switched from collectd to the embedded prometheus
exporter in ceph-mgr. All perf stats are sent to the mgr anyway, so
switching to this approach eliminates a lot of 'moving parts'.
However, if you're happy with collectd we should be able to help too.
Starting from the top, the collector probes the host to look for the
radosgw binary, and if found enables stats gathering. So in your collectd
log on your rgw host are you seeing "Roles detected .... rgw: True"?
By default the collector should be running in debug mode, which will give
us some more info in /var/log/collectd-cephmetrics.log - could you upload
that to pastebin and drop the link in here?
Another common problem with access is selinux - is it possible that
selinux is blocking?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#218 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJn1vxXUkgmGRy2etjTutQTzYJZooC5dks5uRJa5gaJpZM4V-utn>
.
|
any chance you can look into this ? Thanks |
Role detection is done through the presence of the ceph binaries for a mon/osd/rgw...so if you deploy all of these components it will look as though all those roles are active. Is selinux active/blocking? |
Selinux is disabled
Really appreciate you taking the time to look into this
…On Tue, Aug 21, 2018, 4:11 PM pcuzner, ***@***.***> wrote:
Role detection is done through the presence of the ceph binaries for a
mon/osd/rgw...so if you deploy all of these components it will look as
though all those roles are active.
Is selinux active/blocking?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#218 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJn1v8bvvZLAliKrx3gey5A2tbnVj4pMks5uTGlpgaJpZM4V-utn>
.
|
Also no firewall
…On Tue, Aug 21, 2018, 4:50 PM Steven Vacaroaia, ***@***.***> wrote:
Selinux is disabled
Really appreciate you taking the time to look into this
On Tue, Aug 21, 2018, 4:11 PM pcuzner, ***@***.***> wrote:
> Role detection is done through the presence of the ceph binaries for a
> mon/osd/rgw...so if you deploy all of these components it will look as
> though all those roles are active.
>
> Is selinux active/blocking?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#218 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AJn1v8bvvZLAliKrx3gey5A2tbnVj4pMks5uTGlpgaJpZM4V-utn>
> .
>
|
If selinux is not blocking then we need to check the output we're expecting. Can you run the following; 1.check hostname resolves as expected (from the dir where the collectors are installed) from collectors.common import get_hostname if that's ok, move on to 2
from ceph_daemon import admin_socket h = get_hostname() print("{} Socket files found: {}".format(len(sockets), ','.join(sockets))) raw = admin_socket(sockets[0], ['perf','dump'], format='json') should show something like |
pwd
/usr/lib64/collectd/python-plugins/collectors
[root@mon01 collectors]# ls
base.py base.pyc common.py common.pyc __init__.py __init__.pyc
iscsi.py iscsi.pyc mon.py mon.pyc osd.py osd.pyc rgw.py rgw.pyc
test.py
using collectors.common did not work
from collectors.common import get_hostname
ImportError: No module named collectors.common
However, using just common worked
RGW socket test worked also as long as I was suing "from common" instead
of "from collect.common"
RGW test results
[root@osd01 ~]# ls /var/run/ceph
ceph-client.rgw.osd01.1519.94623440453632.asok ceph-osd.0.asok
…On Thu, 23 Aug 2018 at 23:22, pcuzner ***@***.***> wrote:
If selinux is not blocking then we need to check the output we're
expecting. Can you run the following;
1.check hostname resolves as expected (from the dir where the collectors
are installed)
from collectors.common import get_hostname
print(get_hostname())
if that's ok, move on to 2
1. check content of perfdump from radosgw socket
from ceph_daemon import admin_socket
from collectors.common import get_hostname
import json
import glob
h = get_hostname()
prin("host is {}".format(h))
sockets = glob.glob('/var/run/ceph/ceph-client.rgw.{}.*asok'.format(h))
print("{} Socket files found: {}".format(len(sockets), ','.join(sockets)))
raw = admin_socket(sockets[0], ['perf','dump'], format='json')
resp = json.loads(raw)
print(s.get(h))
should show something like
{u'qlen': 0, u'get': 492577570, u'failed_req': 0,
u'keystone_token_cache_miss': 0, u'req': 990144796, u'put_b':
258254121730048, u'keystone_token_cache_hit': 0, u'qactive': 0,
u'cache_miss': 3095675, u'put_initial_lat': {u'sum': 33889048.13001993,
u'avgtime': 0.137597955, u'avgcount': 246290346}, u'put': 246290358,
u'cache_hit': 995756713, u'get_initial_lat': {u'sum': 1447620.759260316,
u'avgtime': 0.005877737, u'avgcount': 246288767}, u'get_b': 258252243857025}
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#218 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AJn1vznIig9qoDy3jIqdI95BceZFmr_hks5uT3FpgaJpZM4V-utn>
.
|
So I embedded d=some logging and it seems raw_stats is not populated
despite the socket being found
INFO - [rgw.py:50:_get_rgw_data() - only one rgw_socket found with
lenght['/var/run/ceph/ceph-client.rgw.osd01.1519.94623440453632.asok']
2018-08-27 15:59:22,212 - DEBUG - [base.py:62:_admin_socket() -
admin_socket call 'perf dump' : 0.009s
2018-08-27 15:59:22,212 - INFO - [rgw.py:57:_get_rgw_data() - response
is sent from socket 0
/var/run/ceph/ceph-client.rgw.osd01.1519.94623440453632.asok
2018-08-27 15:59:22,212 - INFO - [rgw.py:61:_get_rgw_data() - key si
client.rgw.osd01
2018-08-27 15:59:22,212 - INFO - [rgw.py:91:get_stats() - raw stats
isNone
2018-08-27 15:59:22,212 - WARNING - [rgw.py:100:get_stats() - RGW socket
not available...radosgw running?
rgw_sockets =
glob.glob('/var/run/ceph/ceph-client.rgw.{}.*asok'.format(self.host_name))
if rgw_sockets:
lll = len(rgw_sockets)
self.logger.info("only one rgw_socket found with
lenght{}".format(rgw_sockets))
if len(rgw_sockets) > 1:
self.logger.warning("multiple rgw sockets found - "
"data sent from
{}".format(rgw_sockets[0]))
response = self._admin_socket(socket_path=rgw_sockets[0])
self.logger.info( "response is sent from socket 0
{}".format(rgw_sockets[0]))
if response:
key_name = 'client.rgw.{}'.format(self.host_name)
self.logger.info( "key si {} ".format(key_name))
return response.get(key_name)
else:
# admin_socket call failed
self.logger.info( " socket failed on
{}".format(self.host_name))
return {}
else:
# no socket found on the host, nothing to send to caller
self.logger.info( " no socket on {}".format(self.host_name))
return {}
@staticmethod
def stats_filter(stats):
# pick out the simple metrics
filtered = {key: stats[key] for key in RGW.simple_metrics}
for key in RGW.int_latencies:
for _attr in stats[key]:
new_key = "{}_{}".format(key, _attr)
filtered[new_key] = stats[key].get(_attr)
return filtered
def get_stats(self):
start = time.time()
raw_stats = self._get_rgw_data()
self.logger.info("raw stats is{}".format(raw_stats))
…On Fri, 24 Aug 2018 at 09:22, Steven Vacaroaia ***@***.***> wrote:
pwd
/usr/lib64/collectd/python-plugins/collectors
***@***.*** collectors]# ls
base.py base.pyc common.py common.pyc __init__.py __init__.pyc
iscsi.py iscsi.pyc mon.py mon.pyc osd.py osd.pyc rgw.py rgw.pyc
test.py
using collectors.common did not work
from collectors.common import get_hostname
ImportError: No module named collectors.common
However, using just common worked
RGW socket test worked also as long as I was suing "from common" instead
of "from collect.common"
RGW test results
***@***.*** ~]# ls /var/run/ceph
ceph-client.rgw.osd01.1519.94623440453632.asok ceph-osd.0.asok
On Thu, 23 Aug 2018 at 23:22, pcuzner ***@***.***> wrote:
> If selinux is not blocking then we need to check the output we're
> expecting. Can you run the following;
>
> 1.check hostname resolves as expected (from the dir where the collectors
> are installed)
>
> from collectors.common import get_hostname
> print(get_hostname())
>
> if that's ok, move on to 2
>
> 1. check content of perfdump from radosgw socket
>
> from ceph_daemon import admin_socket
> from collectors.common import get_hostname
> import json
> import glob
>
> h = get_hostname()
> prin("host is {}".format(h))
> sockets = glob.glob('/var/run/ceph/ceph-client.rgw.{}.*asok'.format(h))
>
> print("{} Socket files found: {}".format(len(sockets), ','.join(sockets)))
>
> raw = admin_socket(sockets[0], ['perf','dump'], format='json')
> resp = json.loads(raw)
> print(s.get(h))
>
> should show something like
> {u'qlen': 0, u'get': 492577570, u'failed_req': 0,
> u'keystone_token_cache_miss': 0, u'req': 990144796, u'put_b':
> 258254121730048, u'keystone_token_cache_hit': 0, u'qactive': 0,
> u'cache_miss': 3095675, u'put_initial_lat': {u'sum': 33889048.13001993,
> u'avgtime': 0.137597955, u'avgcount': 246290346}, u'put': 246290358,
> u'cache_hit': 995756713, u'get_initial_lat': {u'sum': 1447620.759260316,
> u'avgtime': 0.005877737, u'avgcount': 246288767}, u'get_b': 258252243857025}
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#218 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AJn1vznIig9qoDy3jIqdI95BceZFmr_hks5uT3FpgaJpZM4V-utn>
> .
>
|
Hi,
RGW socket is not discovered
I have changed rgw_sockets to this but still no luck
rgw_sockets = glob.glob('/var/run/ceph/ceph-client.rgw.*.asok')
I am using Mimic and the socket looks like this
/var/run/ceph/ceph-client.rgw.osd01.1519.94623440453632.asok
Any help will be truly appreciated as this is an AWESOME piece of software
Steven
The text was updated successfully, but these errors were encountered: