Skip Menu | Logged in as guest | Logout
 
Ticket metadata
Id: 2922
Status: resolved
Priority: 4/0
Queue: vdt-support

Fixed in: (no value)
Fix scheduled: (no value)

Owner: Alain Roy
Requestors: osg@tick-indy.globalnoc.iu.edu
Cc: jfrey@cs.wisc.edu
AdminCc:

New reminder:

Created: Tue Aug 21 14:39:36 2007
Starts: Not set
Started: Not set
Last Contact: Wed Sep 12 14:33:15 2007
Due: Not set
Closed: Tue Sep 18 17:21:28 2007
Updated: Tue Sep 18 17:21:28 2007 by roy



History Brief headersFull headers
Subject: Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 21 Aug 2007 19:35 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.7k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/21/2007 at 19:35:17 with the following information:

FootPrints Ticket Description:
VDT Support,

Please respond to the following issue.

Thank You,
Tim Silvers
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
web: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/

From: hs@nhn.ou.edu
Subject: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working
Date: August 21, 2007 3:28:47 PM EDT
To: goc@opensciencegrid.org
Cc: adesmet@cs.wisc.edu
Reply-To: hs@nhn.ou.edu

Hi all,

GridEx jobs have just been resumed on OUHEP_ITB (osgitb1.nhn.ou.edu),
and immediately the load went through the roof again and is now at 7.

And when I checked, I saw that the grid_manager_monitor is apparently
still not working, since there is no such process, but rather
one globus-job-manager process for each submitted GridEx job.

Can we please get this resolved? It's been like that since at least June,
and we need this testbed for real testing and can't afford to have it
bogged down with GridEx jobs like this.

Thanks,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] RR - Open Science Grid - 4004 - 4 - grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working ISSUE=4004 PROJ=71
Date: Tue, 21 Aug 2007 19:41 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.7k
When replying, type your text above this line.
----------------------------------------------
Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" was assigned to you via round-robin on 2007-08-21 at 19:41:04 with the following information:

Footprints Ticket Description:
Entered on 08/21/2007 at 19:35:17 by Tim Silvers:
VDT Support,

Please respond to the following issue.

Thank You,
Tim Silvers
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
web: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/

From: hs@nhn.ou.edu
Subject: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working
Date: August 21, 2007 3:28:47 PM EDT
To: goc@opensciencegrid.org
Cc: adesmet@cs.wisc.edu
Reply-To: hs@nhn.ou.edu

Hi all,

GridEx jobs have just been resumed on OUHEP_ITB (osgitb1.nhn.ou.edu),
and immediately the load went through the roof again and is now at 7.

And when I checked, I saw that the grid_manager_monitor is apparently
still not working, since there is no such process, but rather
one globus-job-manager process for each submitted GridEx job.

Can we please get this resolved? It's been like that since at least June,
and we need this testbed for real testing and can't afford to have it
bogged down with GridEx jobs like this.

Thanks,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 21 Aug 2007 19:41 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.6k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/21/2007 at 19:41:08 with the following information:

FootPrints Ticket Description:
Greetings from the VDT support system!

This message was generated automatically in response to the creation of a
ticket regarding:

Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71

Your original request is copied below for reference.

There is no need to reply to this message right now -- someone from the VDT
team will respond to you as soon as possible. If you wish to view your
support ticket online, visit:

http://vdt.cs.wisc.edu/rt/Ticket/Display.html?user=guest&pass=guest&id=2922

Your ticket has been assigned an ID as follows:

[vdt-support #2922]

Please include the ticket ID in the subject line of all future email about
this issue. To do so, you may reply to this message.

Thank you for your interest in the VDT.

-------------------------------------------------------------------------
[Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 21 Aug 2007 19:59 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 2.2k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/21/2007 at 19:59:07 with the following information:

FootPrints Ticket Description:
Hi again,

some more info which may have gotten lost since the last time we talked
about this.

So this is a RHEL5 machine with an osg-0.7.0 gatekeeper:

[hs@osgitb1 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux Client release 5 (Tikanga)
[hs@osgitb1 ~]$ uname -a
Linux osgitb1.nhn.ou.edu 2.6.18-8.1.8.el5 #1 SMP Mon Jun 25 17:06:19 EDT 2007
i686 i686 i386 GNU/Linux
[hs@osgitb1 ~]$ vdt-version
You have installed a subset of VDT version 1.8.0d:
Apache HTTPD 2.2.4
gLite CEMon Server (INFN release from 2006-05-19, plus RAW dialect) 1.7.1
CA Certificates v29 (includes IGTF 1.16 CAs)
CONDOR-DEVEL (Not an official part of the VDT)
EDG Make Gridmap 2.9.0
Fetch CRL 2.6.2
Generic Information Provider 1.0.15 (Iowa 15-Feb-2006)
Globus Toolkit, pre web-services, client 4.0.5
Globus Toolkit, pre web-services, server 4.0.5
Globus Toolkit, web-services, client 4.0.5
Globus Toolkit, web-services, server 4.0.5
GLUE Schema 1.2 draft 7
GPT 3.2
Gratia Condor Probe 0.26.2b-1
GRATIA_METRIC_PROBE (Not an official part of the VDT)
Java SDK 1.4.2_14
Java 5 SDK 1.5.0_12
KX509 20031111
Logrotate 3.7
MonALISA 1.6.16
MyProxy 3.9
MySQL 4.1.22
MySQL Connector/J 5.0.6
Pegaus Worker Package 2.0.1
PPDG Cert Scripts 2.4
PRIMA Authorization Module 0.6
RLS, client 3.0.041021
SRM V1 Client 1.25
SRM V2 Client 2.2.0.2
syslog-ng 2.0.4
Apache Tomcat 5.0.28
UberFTP 1.24
Wget 1.10.2

Please let me know if I can provide you with any other helpful information.

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Thu, 23 Aug 2007 13:26 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 7.7k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/23/2007 at 13:26:08 with the following information:

FootPrints Ticket Description:
Hi again,

looks like osgitb1 has been completely removed from the GridEx list now.
This is not helpful, since we need to figure out why the grid monitor
isn't working, not ignore it altogether. Can we please put it back in
and try to solve this?

Thanks,

Horst

------------- Begin Forwarded Message -------------

Date: Thu, 23 Aug 2007 00:01:05 -0500
From: Grid Exerciser <grid-ex@cs.wisc.edu>
Subject: OSG-ITB Grid Exerciser Results (2007-08-23)
To: adesmet@cs.wisc.edu, jfrey@cs.wisc.edu, OSG-INT@OPENSCIENCEGRID.ORG
List-Owner: <mailto:OSG-INT-request@LISTSERV.FNAL.GOV>

Information on reading this report can be found at
http://www.cs.wisc.edu/condor/tools/exerciser/reading_report.html

General information on the Grid Exerciser can be found at
http://www.cs.wisc.edu/condor/tools/exerciser/

The maximum simultaneous jobs to any given site are currently throttled to 10.

Job duration: 900 sec
Maximum job duration for timeout: 10800 sec
Current run submitted at Wed Aug 22 10:30:42 2007.

Grid Exerciser (Experimental)
Results for OSG-ITB from Wed Aug 22 00:01:01 2007 through Thu Aug 23 00:01:01
2007
Report generated on Thu Aug 23 00:01:05 2007

Status Summary

Site Simul Submit Rec'd Timout
Done Errors Run Time
Default
citgrid3.cacr.caltech.edu/jobmanager-condor 10 10 1234 0
0 1235 0
cms-xen1.fnal.gov/jobmanager-condor 10 74 84 0
66 11 16
cms-xen9.fnal.gov/jobmanager-condor 10 71 81 0
61 10 15
cmsitbsrv01.fnal.gov/jobmanager-condor 10 443 443 0
433 0 114
cmssrv09.fnal.gov/jobmanager-condor 10 198 198 0
188 0 50
feynman.uits.iupui.edu/jobmanager-condor 10 10 0 0
0 460 0
fgitb-gk.fnal.gov/jobmanager-condor 10 454 454 0
444 0 117
gk.phys.sinica.edu.tw/jobmanager-condor 10 10 50 40
0 0 0
gridtest01.racf.bnl.gov/jobmanager-condor 10 10 0 40
0 0 0
grow-itb.its.uiowa.edu/jobmanager-pbs 10 10 0 40
0 0 0
ligo-itb.aset.psu.edu/jobmanager-pbs 10 22 176 0
12 361 3
osg-itb.ligo.caltech.edu/jobmanager-condor 10 452 452 0
442 0 115
osg-vtb.ligo.caltech.edu/jobmanager-condor 10 98 148 0
88 50 23
osggate.clemson.edu/jobmanager-condor 10 222 222 0
212 0 55
osp-vtb00.nersc.gov/jobmanager-sge 10 10 0 0
0 1280 0
pc1805.nersc.gov/jobmanager-sge 10 10 0 40
0 0 0
pdsfgrid1/jobmanager-sge 10 10 0 40
0 0 0
t2dev-osg.uchicago.edu/jobmanager-condor 10 10 0 0
0 1280 0
tb10.grid.iu.edu/jobmanager-condor 10 10 0 0
0 461 0
testwulf.hpcc.ttu.edu/jobmanager-pbs 10 222 222 0
212 0 55

GRAND TOTAL (20 sites) 200 2356 3764 200
2158 5148 567

Globus Error Summary

Globus Error Codes: 7 17 79 121 **** Failur
citgrid3.cacr.caltec 0 1235 0 0 0 100.0%
cms-xen1.fnal.gov/jo 1 0 0 10 0 14.3%
cms-xen9.fnal.gov/jo 0 0 0 10 0 14.1%
cmsitbsrv01.fnal.gov 0 0 0 0 0 0.0%
cmssrv09.fnal.gov/jo 0 0 0 0 0 0.0%
feynman.uits.iupui.e 0 0 0 0 460 100.0%
fgitb-gk.fnal.gov/jo 0 0 0 0 0 0.0%
gk.phys.sinica.edu.t 0 0 0 0 0
gridtest01.racf.bnl. 0 0 0 0 0
grow-itb.its.uiowa.e 0 0 0 0 0
ligo-itb.aset.psu.ed 0 0 81 0 280 96.8%
osg-itb.ligo.caltech 0 0 0 0 0 0.0%
osg-vtb.ligo.caltech 0 50 0 0 0 36.2%
osggate.clemson.edu/ 0 0 0 0 0 0.0%
osp-vtb00.nersc.gov/ 1280 0 0 0 0 100.0%
pc1805.nersc.gov/job 0 0 0 0 0
pdsfgrid1/jobmanager 0 0 0 0 0
t2dev-osg.uchicago.e 1280 0 0 0 0 100.0%
tb10.grid.iu.edu/job 0 0 0 0 461 100.0%
testwulf.hpcc.ttu.ed 0 0 0 0 0 0.0%
TOTALS: 2561 1285 81 20 1201
PERCENT OF ERRORS: 49.7 25.0 1.6 0.4 23.3
**** These errors were not Globus errors. See below for details.

Error Details

citgrid3.cacr.caltech.edu/jobmanager-condor
1235 Globus error 17: the job failed when the job manager attempted to
run it

cms-xen1.fnal.gov/jobmanager-condor
1 Globus error 7: authentication with the remote server failed
10 Globus error 121: the job state file doesn't exist
10 Grid Resource Back Up
10 Detected Down Globus Resource

cms-xen9.fnal.gov/jobmanager-condor
10 Globus error 121: the job state file doesn't exist
10 Grid Resource Back Up
10 Detected Down Globus Resource

cmsitbsrv01.fnal.gov/jobmanager-condor
No errors

cmssrv09.fnal.gov/jobmanager-condor
No errors

feynman.uits.iupui.edu/jobmanager-condor
460 Unspecified gridmanager error

fgitb-gk.fnal.gov/jobmanager-condor
No errors

gk.phys.sinica.edu.tw/jobmanager-condor
No errors

gridtest01.racf.bnl.gov/jobmanager-condor
10 Detected Down Globus Resource

grow-itb.its.uiowa.edu/jobmanager-pbs
10 Detected Down Globus Resource

ligo-itb.aset.psu.edu/jobmanager-pbs
81 Globus error 79: connecting to the job manager failed. Possible
reasons: job terminated, invalid job contact, network problems, ...
280 Unspecified gridmanager error

osg-itb.ligo.caltech.edu/jobmanager-condor
No errors

osg-vtb.ligo.caltech.edu/jobmanager-condor
50 Globus error 17: the job failed when the job manager attempted to
run it

osggate.clemson.edu/jobmanager-condor
No errors

osp-vtb00.nersc.gov/jobmanager-sge
1280 Globus error 7: authentication with the remote server failed

pc1805.nersc.gov/jobmanager-sge
10 Detected Down Globus Resource

pdsfgrid1/jobmanager-sge
10 Detected Down Globus Resource

t2dev-osg.uchicago.edu/jobmanager-condor
1280 Globus error 7: authentication with the remote server failed

tb10.grid.iu.edu/jobmanager-condor
461 Unspecified gridmanager error

testwulf.hpcc.ttu.edu/jobmanager-pbs
No errors

This report took 0.1 minutes to generate

------------- End Forwarded Message -------------

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Download (untitled) / with headers
text/plain 1.4k
> From: hs@nhn.ou.edu
> Subject: grid monitor for GridEx jobs on OUHEP_ITB
> (osgitb1.nhn.ou.edu) still not working
> Date: August 21, 2007 3:28:47 PM EDT
> To: goc@opensciencegrid.org
> Cc: adesmet@cs.wisc.edu
> Reply-To: hs@nhn.ou.edu
>
> Hi all,
>
> GridEx jobs have just been resumed on OUHEP_ITB (osgitb1.nhn.ou.edu),
> and immediately the load went through the roof again and is now at 7.
>
> And when I checked, I saw that the grid_manager_monitor is apparently
> still not working, since there is no such process, but rather
> one globus-job-manager process for each submitted GridEx job.
>
> Can we please get this resolved? It's been like that since at least
> June,
> and we need this testbed for real testing and can't afford to have it
> bogged down with GridEx jobs like this.
>
> Thanks,

I've added Jaime Frey to this ticket in this hope that he can help us
debug the problem.

Jaime, you can view the full ticket (which has a bit more than this
email) at:

<http://vdt.cs.wisc.edu/rt/index.html?user=guest&pass=guest&q=2922>

Could you give us some advice on how to debug why the grid monitor is
not working for Horst's site? These are grid exerciser jobs, and those
are definitely using the grid monitor, so this is a bit mysterious.

Jaime, don't feel that you need to handle the entire ticket: it is
assigned to me. But if you have any advice on where to begin looking for
the problem, that would be greatly appreciated.

Thanks,
-alain
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Fri, 24 Aug 2007 05:11 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 2.3k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/24/2007 at 05:11:07 with the following information:

FootPrints Ticket Description:
> From: hs@nhn.ou.edu
> Subject: grid monitor for GridEx jobs on OUHEP_ITB
> (osgitb1.nhn.ou.edu) still not working
> Date: August 21, 2007 3:28:47 PM EDT
> To: goc@opensciencegrid.org
> Cc: adesmet@cs.wisc.edu
> Reply-To: hs@nhn.ou.edu
>
> Hi all,
>
> GridEx jobs have just been resumed on OUHEP_ITB (osgitb1.nhn.ou.edu),
> and immediately the load went through the roof again and is now at 7.
>
> And when I checked, I saw that the grid_manager_monitor is apparently
> still not working, since there is no such process, but rather
> one globus-job-manager process for each submitted GridEx job.
>
> Can we please get this resolved? It's been like that since at least
> June,
> and we need this testbed for real testing and can't afford to have it
> bogged down with GridEx jobs like this.
>
> Thanks,

I've added Jaime Frey to this ticket in this hope that he can help us
debug the problem.

Jaime, you can view the full ticket (which has a bit more than this
email) at:

<http://vdt.cs.wisc.edu/rt/index.html?user=guest&pass=guest&q=2922>

Could you give us some advice on how to debug why the grid monitor is
not working for Horst's site? These are grid exerciser jobs, and those
are definitely using the grid monitor, so this is a bit mysterious.

Jaime, don't feel that you need to handle the entire ticket: it is
assigned to me. But if you have any advice on where to begin looking for
the problem, that would be greatly appreciated.

Thanks,
-alain

--
View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?user=guest&pass=guest&id=2922>
VDT Support, vdt-support@ivdgl.org

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
CC: osg@tick-indy.globalnoc.iu.edu
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Fri, 24 Aug 2007 14:13:33 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Download (untitled) / with headers
text/plain 2.9k
On Aug 24, 2007, at 12:10 AM, Alain Roy via RT wrote:

>> GridEx jobs have just been resumed on OUHEP_ITB (osgitb1.nhn.ou.edu),
>> and immediately the load went through the roof again and is now at 7.
>>
>> And when I checked, I saw that the grid_manager_monitor is apparently
>> still not working, since there is no such process, but rather
>> one globus-job-manager process for each submitted GridEx job.
>>
>> Can we please get this resolved? It's been like that since at least
>> June,
>> and we need this testbed for real testing and can't afford to have it
>> bogged down with GridEx jobs like this.
>>
>> Thanks,
>
> I've added Jaime Frey to this ticket in this hope that he can help us
> debug the problem.
>
> Jaime, you can view the full ticket (which has a bit more than this
> email) at:
>
> <http://vdt.cs.wisc.edu/rt/index.html?user=guest&pass=guest&q=2922>
>
> Could you give us some advice on how to debug why the grid monitor is
> not working for Horst's site? These are grid exerciser jobs, and those
> are definitely using the grid monitor, so this is a bit mysterious.
>
> Jaime, don't feel that you need to handle the entire ticket: it is
> assigned to me. But if you have any advice on where to begin
> looking for
> the problem, that would be greatly appreciated.

If the grid monitor fails to report the status of jobs, the Condor
gridmanager will fall back to running a limited number of jobmanager
processes (no more than 10 by default).

The best way to start debugging grid monitor problems is to run it
from the command line. Here are instructions:

Run the following command, substituting as appropriate:
globusrun -s -r <resource>/jobmanager-fork '&(executable=
$(GLOBUSRUN_GASS_URL)/<condor path>/sbin/grid_monitor.sh)
(arguments="--dest-url="#$(GLOBUSRUN_GASS_URL)#"/tmp/job_status")'

That should all be on one line.

If it's working correctly, it should print out something like this:

2006-01-31 16:21:17 OK:
2006-01-31 16:21:17 INFO: Forced agent start
2006-01-31 16:21:17 INFO: Starting grid_manager_monitor_agent
2006-01-31 16:21:17 INFO: Started grid_manager_monitor_agent as
/tmp/grid_manager_monitor_agent.jfrey.18795.1000, pid 18797
2006-01-31 16:21:17 INFO: grid_manager_monitor_agent already running.

and continue to run, printing out an 'OK' line every minute.

Then, /tmp/job_status should appear on your machine and contain
something
like this:

1138746108 1138746108
https://nostos.cs.wisc.edu:43462/8588/1137692629/ 8
https://nostos.cs.wisc.edu:43962/8760/1137693090/ 8
GRIDMONEOF

The file should be replaced with a fresh version about every minute.

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Fri, 24 Aug 2007 19:14 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 3.6k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/24/2007 at 19:14:11 with the following information:

FootPrints Ticket Description:
On Aug 24, 2007, at 12:10 AM, Alain Roy via RT wrote:

>> GridEx jobs have just been resumed on OUHEP_ITB (osgitb1.nhn.ou.edu),
>> and immediately the load went through the roof again and is now at 7.
>>
>> And when I checked, I saw that the grid_manager_monitor is apparently
>> still not working, since there is no such process, but rather
>> one globus-job-manager process for each submitted GridEx job.
>>
>> Can we please get this resolved? It's been like that since at least
>> June,
>> and we need this testbed for real testing and can't afford to have it
>> bogged down with GridEx jobs like this.
>>
>> Thanks,
>
> I've added Jaime Frey to this ticket in this hope that he can help us
> debug the problem.
>
> Jaime, you can view the full ticket (which has a bit more than this
> email) at:
>
> <http://vdt.cs.wisc.edu/rt/index.html?user=guest&pass=guest&q=2922>
>
> Could you give us some advice on how to debug why the grid monitor is
> not working for Horst's site? These are grid exerciser jobs, and those
> are definitely using the grid monitor, so this is a bit mysterious.
>
> Jaime, don't feel that you need to handle the entire ticket: it is
> assigned to me. But if you have any advice on where to begin
> looking for
> the problem, that would be greatly appreciated.

If the grid monitor fails to report the status of jobs, the Condor
gridmanager will fall back to running a limited number of jobmanager
processes (no more than 10 by default).

The best way to start debugging grid monitor problems is to run it
from the command line. Here are instructions:

Run the following command, substituting as appropriate:
globusrun -s -r <resource>/jobmanager-fork '&(executable=
$(GLOBUSRUN_GASS_URL)/<condor path>/sbin/grid_monitor.sh)
(arguments="--dest-url="#$(GLOBUSRUN_GASS_URL)#"/tmp/job_status")'

That should all be on one line.

If it's working correctly, it should print out something like this:

2006-01-31 16:21:17 OK:
2006-01-31 16:21:17 INFO: Forced agent start
2006-01-31 16:21:17 INFO: Starting grid_manager_monitor_agent
2006-01-31 16:21:17 INFO: Started grid_manager_monitor_agent as
/tmp/grid_manager_monitor_agent.jfrey.18795.1000, pid 18797
2006-01-31 16:21:17 INFO: grid_manager_monitor_agent already running.

and continue to run, printing out an 'OK' line every minute.

Then, /tmp/job_status should appear on your machine and contain
something
like this:

1138746108 1138746108
https://nostos.cs.wisc.edu:43462/8588/1137692629/ 8
https://nostos.cs.wisc.edu:43962/8760/1137693090/ 8
GRIDMONEOF

The file should be replaced with a fresh version about every minute.

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Fri, 24 Aug 2007 19:17 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 3.7k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/24/2007 at 19:17:10 with the following information:

FootPrints Ticket Description:
On Aug 24, 2007, at 12:10 AM, Alain Roy via RT wrote:

>> GridEx jobs have just been resumed on OUHEP_ITB (osgitb1.nhn.ou.edu),
>> and immediately the load went through the roof again and is now at 7.
>>
>> And when I checked, I saw that the grid_manager_monitor is apparently
>> still not working, since there is no such process, but rather
>> one globus-job-manager process for each submitted GridEx job.
>>
>> Can we please get this resolved? It's been like that since at least
>> June,
>> and we need this testbed for real testing and can't afford to have it
>> bogged down with GridEx jobs like this.
>>
>> Thanks,
>
> I've added Jaime Frey to this ticket in this hope that he can help us
> debug the problem.
>
> Jaime, you can view the full ticket (which has a bit more than this
> email) at:
>
> <http://vdt.cs.wisc.edu/rt/index.html?user=guest&pass=guest&q=2922>
>
> Could you give us some advice on how to debug why the grid monitor is
> not working for Horst's site? These are grid exerciser jobs, and those
> are definitely using the grid monitor, so this is a bit mysterious.
>
> Jaime, don't feel that you need to handle the entire ticket: it is
> assigned to me. But if you have any advice on where to begin
> looking for
> the problem, that would be greatly appreciated.

If the grid monitor fails to report the status of jobs, the Condor
gridmanager will fall back to running a limited number of jobmanager
processes (no more than 10 by default).

The best way to start debugging grid monitor problems is to run it
from the command line. Here are instructions:

Run the following command, substituting as appropriate:
globusrun -s -r <resource>/jobmanager-fork '&(executable=
$(GLOBUSRUN_GASS_URL)/<condor path>/sbin/grid_monitor.sh)
(arguments="--dest-url="#$(GLOBUSRUN_GASS_URL)#"/tmp/job_status")'

That should all be on one line.

If it's working correctly, it should print out something like this:

2006-01-31 16:21:17 OK:
2006-01-31 16:21:17 INFO: Forced agent start
2006-01-31 16:21:17 INFO: Starting grid_manager_monitor_agent
2006-01-31 16:21:17 INFO: Started grid_manager_monitor_agent as
/tmp/grid_manager_monitor_agent.jfrey.18795.1000, pid 18797
2006-01-31 16:21:17 INFO: grid_manager_monitor_agent already running.

and continue to run, printing out an 'OK' line every minute.

Then, /tmp/job_status should appear on your machine and contain
something
like this:

1138746108 1138746108
https://nostos.cs.wisc.edu:43462/8588/1137692629/ 8
https://nostos.cs.wisc.edu:43962/8760/1137693090/ 8
GRIDMONEOF

The file should be replaced with a fresh version about every minute.

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+

--
View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?user=guest&pass=guest&id=2922>
VDT Support, vdt-support@ivdgl.org

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 18:02 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 3.8k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/28/2007 at 18:02:08 with the following information:

FootPrints Ticket Description:
Hi Jaime,

thanks for the info, sorry about the delay.

This seems to work fine. I ran this from ouhep5, my desktop, on osgitb1,
the ITB gatekeeper in question, and got the following response:

-----
[hs@ouhep5 hs]$ globusrun -s -r osgitb1.nhn.ou.edu/jobmanager-fork '&(executable=$(GLOBUSRUN_GASS_URL)/usr/local/condor/sbin/grid_monitor.sh)(arguments="--dest-url="#$(GLOBUSRUN_GASS_URL)#"/tmp/job_status")'
/usr/local/opt/osg-0.7.0/apache/lib:/usr/local/opt/osg-0.7.0/MonaLisa/Service/VDTFarm/pgsql/lib:/usr/local/opt/osg-0.7.0/glite/lib:/usr/local/opt/osg-0.7.0/prima/lib:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/server:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/client:/usr/local/opt/osg-0.7.0/mysql/lib/mysql:/usr/local/opt/osg-0.7.0/globus/lib:/usr/local/opt/osg-0.7.0/berkeley-db/lib:/usr/local/opt/osg-0.7.0/expat/lib:/usr/local/opt/osg-0.7.0/apache/lib:/usr/local/opt/osg-0.7.0/MonaLisa/Service/VDTFarm/pgsql/lib:/usr/local/opt/osg-0.7.0/glite/lib:/usr/local/opt/osg-0.7.0/prima/lib:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/server:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/client:/usr/local/opt/osg-0.7.0/mysql/lib/mysql:/usr/local/opt/osg-0.7.0/berkeley-db/lib:/usr/local/opt/osg-0.7.0/expat/lib:
2007-08-28 12:32:29 OK:
2007-08-28 12:32:29 INFO: /usr/local/opt/osg-0.7.0/globus/tmp/gram_job_state/grid_manager_monitor_agent_log.354 missing
Unquoted string "break" may clash with future reserved word at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm line 55.
Useless use of a constant in void context at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm line 55.
// should probably be written as "" at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm line 393.
2007-08-28 12:32:29 INFO: Starting grid_manager_monitor_agent
2007-08-28 12:32:29 INFO: Started grid_manager_monitor_agent as /tmp/grid_manager_monitor_agent.usatlas1.24371.1000, pid 24399
2007-08-28 12:33:29 OK:
2007-08-28 12:34:29 OK:
2007-08-28 12:35:29 OK:
...
-----

And /tmp/job_status does look like you said:

----
1188323129 1188323129
GRIDMONEOF
----

And on osgitb1, I see this in /tmp:

----
[hs@osgitb1 ~]$ ls -ao /tmp/condor-lock.osgitb10.309009174424322/
total 12
drwxr-xr-x 2 condor 4096 Aug 28 10:56 .
drwxrwxrwt 12 root 4096 Aug 28 12:51 ..
-rw------- 1 condor 0 Aug 23 14:25 InstanceLock
prw------- 1 condor 0 Aug 28 10:56 procd_pipe.SCHEDD
prw------- 1 condor 0 Aug 23 15:06 procd_pipe.SCHEDD.watchdog
----

So I didn't get the "Forced agent to start", but otherwise it looks okay,
so it doesn't look like there's a problem on this end, right?

I just tried the same with osgitb1 as the client -- so, from osgitb1
to osgitb1, and I get the same result, so both osg-0.6.0 on SLF3 (ouhep5)
and osg-0.7.0 on RHEL5 (osgitb1) can start a grid_manager_monitor
just fine.

What else can we try to debug this? Can you run a set of GridEx jobs by hand,
and see what that does? What OS version and osg version is the normal
GridEx running on?

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 14:35:47 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Download (untitled) / with headers
text/plain 4.5k
On Aug 28, 2007, at 1:21 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 08/28/2007 at 18:02:08 with the following information:
>
> FootPrints Ticket Description:
> Hi Jaime,
>
> thanks for the info, sorry about the delay.
>
> This seems to work fine. I ran this from ouhep5, my desktop, on
> osgitb1,
> the ITB gatekeeper in question, and got the following response:
>
> -----
> [hs@ouhep5 hs]$ globusrun -s -r osgitb1.nhn.ou.edu/jobmanager-fork
> '&(executable=$(GLOBUSRUN_GASS_URL)/usr/local/condor/sbin/
> grid_monitor.sh)(arguments="--dest-url="#$(GLOBUSRUN_GASS_URL)#"/
> tmp/job_status")'
> /usr/local/opt/osg-0.7.0/apache/lib:/usr/local/opt/osg-0.7.0/
> MonaLisa/Service/VDTFarm/pgsql/lib:/usr/local/opt/osg-0.7.0/glite/
> lib:/usr/local/opt/osg-0.7.0/prima/lib:/usr/local/opt/osg-0.7.0/
> jdk1.5/jre/lib/i386:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/
> server:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/client:/usr/
> local/opt/osg-0.7.0/mysql/lib/mysql:/usr/local/opt/osg-0.7.0/globus/
> lib:/usr/local/opt/osg-0.7.0/berkeley-db/lib:/usr/local/opt/
> osg-0.7.0/expat/lib:/usr/local/opt/osg-0.7.0/apache/lib:/usr/local/
> opt/osg-0.7.0/MonaLisa/Service/VDTFarm/pgsql/lib:/usr/local/opt/
> osg-0.7.0/glite/lib:/usr/local/opt/osg-0.7.0/prima/lib:/usr/local/
> opt/osg-0.7.0/jdk1.5/jre/lib/i386:/usr/local/opt/osg-0.7.0/jdk1.5/
> jre/lib/i386/server:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/
> client:/usr/local/opt/osg-0.7.0/mysql/lib/mysql:/usr/local/opt/
> osg-0.7.0/berkeley-db/lib:/usr/local/opt/osg-0.7.0/expat/lib:
> 2007-08-28 12:32:29 OK:
> 2007-08-28 12:32:29 INFO: /usr/local/opt/osg-0.7.0/globus/tmp/
> gram_job_state/grid_manager_monitor_agent_log.354 missing
> Unquoted string "break" may clash with future reserved word at /usr/
> local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
> condor.pm line 55.
> Useless use of a constant in void context at /usr/local/opt/
> osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm line 55.
> // should probably be written as "" at /usr/local/opt/osg-0.7.0/
> globus/lib/perl/Globus/GRAM/JobManager/condor.pm line 393.
> 2007-08-28 12:32:29 INFO: Starting grid_manager_monitor_agent
> 2007-08-28 12:32:29 INFO: Started grid_manager_monitor_agent as /
> tmp/grid_manager_monitor_agent.usatlas1.24371.1000, pid 24399
> 2007-08-28 12:33:29 OK:
> 2007-08-28 12:34:29 OK:
> 2007-08-28 12:35:29 OK:
> ...
> -----
>
> And /tmp/job_status does look like you said:
>
> ----
> 1188323129 1188323129
> GRIDMONEOF
> ----
>
> And on osgitb1, I see this in /tmp:
>
> ----
> [hs@osgitb1 ~]$ ls -ao /tmp/condor-lock.osgitb10.309009174424322/
> total 12
> drwxr-xr-x 2 condor 4096 Aug 28 10:56 .
> drwxrwxrwt 12 root 4096 Aug 28 12:51 ..
> -rw------- 1 condor 0 Aug 23 14:25 InstanceLock
> prw------- 1 condor 0 Aug 28 10:56 procd_pipe.SCHEDD
> prw------- 1 condor 0 Aug 23 15:06 procd_pipe.SCHEDD.watchdog
> ----
>
> So I didn't get the "Forced agent to start", but otherwise it looks
> okay,
> so it doesn't look like there's a problem on this end, right?
>
> I just tried the same with osgitb1 as the client -- so, from osgitb1
> to osgitb1, and I get the same result, so both osg-0.6.0 on SLF3
> (ouhep5)
> and osg-0.7.0 on RHEL5 (osgitb1) can start a grid_manager_monitor
> just fine.
>
> What else can we try to debug this? Can you run a set of GridEx
> jobs by hand,
> and see what that does? What OS version and osg version is the normal
> GridEx running on?

The job status file that the grid monitor sends back to the client
machine (/tmp/job_status when you run it from the command line)
should contain a line for each job currently submitted to gram under
the same unix uid on the gatekeeper. Your job status file has none.

Can you try submitting a long sleep job to the gatekeeper via Condor-
G before running the grid monitor from the command line? Then we know
that at least one job should show up in the file the grid monitor
sends back. If the file still has no jobs, then we know something's
wrong.

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 19:38 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 850b
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/28/2007 at 19:38:08 with the following information:

FootPrints Ticket Description:
On Aug 28, 2007, at 1:21 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 19:44 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/28/2007 at 19:44:11 with the following information:

FootPrints Ticket Description:
Jamie/VDT,

Looks to me you wrote in something but the note got completely snipped because you
likely wrote your response below the line that says: "When replying, type your text
above this line.". I found your latest note on the VDT ticket and copied n pasted it
below. But please write your notes *above* that line in your future correspondences
that involve GOC ticket system. [I know it's a wee bit annoying but I have no control
over that behavior]. Thanks!

Arvind

---------

Jamie's note:
The job status file that the grid monitor sends back to the client
machine (/tmp/job_status when you run it from the command line)
should contain a line for each job currently submitted to gram under
the same unix uid on the gatekeeper. Your job status file has none.

Can you try submitting a long sleep job to the gatekeeper via Condor-
G before running the grid monitor from the command line? Then we know
that at least one job should show up in the file the grid monitor
sends back. If the file still has no jobs, then we know something's
wrong.

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 20:26 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 3.5k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/28/2007 at 20:26:08 with the following information:

FootPrints Ticket Description:
Hi Jaime,

okay, I have some more info.

After I started a sleep job from ouhep5 on osgitb1 via Condor-G,
I ran the monitor again, and this time it crashed:

-----
[hs@ouhep5 hs]$ globusrun -s -r osgitb1.nhn.ou.edu/jobmanager-fork
'&(executable=$(GLOBUSRUN_GASS_URL)/usr/local/condor/sbin/grid_monitor.sh)(argum
ents="--dest-url="#$(GLOBUSRUN_GASS_URL)#"/tmp/job_status")'
/usr/local/opt/osg-0.7.0/apache/lib:/usr/local/opt/osg-0.7.0/MonaLisa/Service/VD
TFarm/pgsql/lib:/usr/local/opt/osg-0.7.0/glite/lib:/usr/local/opt/osg-0.7.0/prim
a/lib:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386:/usr/local/opt/osg-0.7.0/jdk1
.5/jre/lib/i386/server:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/client:/usr/
local/opt/osg-0.7.0/mysql/lib/mysql:/usr/local/opt/osg-0.7.0/globus/lib:/usr/loc
al/opt/osg-0.7.0/berkeley-db/lib:/usr/local/opt/osg-0.7.0/expat/lib:/usr/local/o
pt/osg-0.7.0/apache/lib:/usr/local/opt/osg-0.7.0/MonaLisa/Service/VDTFarm/pgsql/
lib:/usr/local/opt/osg-0.7.0/glite/lib:/usr/local/opt/osg-0.7.0/prima/lib:/usr/l
ocal/opt/osg-0.7.0/jdk1.5/jre/lib/i386:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i
386/server:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/client:/usr/local/opt/os
g-0.7.0/mysql/lib/mysql:/usr/local/opt/osg-0.7.0/berkeley-db/lib:/usr/local/opt/
osg-0.7.0/expat/lib:
2007-08-28 15:09:38 OK:
22007-08-28 15:09:38 INFO: Forced agent start
2007-08-28 15:09:38 INFO: Starting grid_manager_monitor_agent
Unquoted string "break" may clash with future reserved word at
/usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm line
55.
Useless use of a constant in void context at
/usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm line
55.
// should probably be written as "" at
/usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm line
393.
Can't locate object method "new" via package "Globus::GRAM::JobManager::condor"
at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm
line 29.
2007-08-28 15:09:38 INFO: Started grid_manager_monitor_agent as
/tmp/grid_manager_monitor_agent.usatlas1.5908.1000, pid 5930
2007-08-28 15:09:39 ERROR: 8: grid_manager_monitor_agent (pid 5930) exited with
a 255 result (65280).
-----

Then I started it again, and then it ran for a while, and produced
some output:

-----
[hs@ouhep5 hs]$ cat /tmp/job_status
1188332060 1188332060
https://osgitb1.nhn.ou.edu:63015/7990/1188332042/ 32
GRIDMONEOF
-----

But then it crashed again with the same error.

And when I submitted the monitor from osgitb1, it additionally
gave me this error:

-----
Can't locate object method "new" via package "Globus::GRAM::JobManager::condor"
at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm
line 29.
-----

But the /tmp/job_status on osgitb1 also looked the same as on ouhep5,
so it picked up the job, too.

Does that tell you anything?

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 16:23:09 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Download (untitled) / with headers
text/plain 5.1k
If the grid monitor is regularly crashing with errors like these,
that would explain the behavior that was reported. When the grid
monitor fails, the Condor gridmnager will restart up to 10
jobmanagers, which will increase the load on the CE.

The error to investigate is this one:
Can't locate object method "new" via package
"Globus::GRAM::JobManager::condor"
at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
condor.pm
line 29.

This may be related to a problem I saw earlier this month. The grid
monitor was failing at LTU because it was using the system-installed
perl and the standard perl library path, which was missing a critical
module. Globus was using an OSG-installed perl with its own library
path, which had the module.

-- Jaime

On Aug 28, 2007, at 3:41 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 08/28/2007 at 20:26:08 with the following information:
>
> FootPrints Ticket Description:
> Hi Jaime,
>
> okay, I have some more info.
>
> After I started a sleep job from ouhep5 on osgitb1 via Condor-G,
> I ran the monitor again, and this time it crashed:
>
> -----
> [hs@ouhep5 hs]$ globusrun -s -r osgitb1.nhn.ou.edu/jobmanager-fork
> '&(executable=$(GLOBUSRUN_GASS_URL)/usr/local/condor/sbin/
> grid_monitor.sh)(argum
> ents="--dest-url="#$(GLOBUSRUN_GASS_URL)#"/tmp/job_status")'
> /usr/local/opt/osg-0.7.0/apache/lib:/usr/local/opt/osg-0.7.0/
> MonaLisa/Service/VD
> TFarm/pgsql/lib:/usr/local/opt/osg-0.7.0/glite/lib:/usr/local/opt/
> osg-0.7.0/prim
> a/lib:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386:/usr/local/opt/
> osg-0.7.0/jdk1
> .5/jre/lib/i386/server:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/
> client:/usr/
> local/opt/osg-0.7.0/mysql/lib/mysql:/usr/local/opt/osg-0.7.0/globus/
> lib:/usr/loc
> al/opt/osg-0.7.0/berkeley-db/lib:/usr/local/opt/osg-0.7.0/expat/
> lib:/usr/local/o
> pt/osg-0.7.0/apache/lib:/usr/local/opt/osg-0.7.0/MonaLisa/Service/
> VDTFarm/pgsql/
> lib:/usr/local/opt/osg-0.7.0/glite/lib:/usr/local/opt/osg-0.7.0/
> prima/lib:/usr/l
> ocal/opt/osg-0.7.0/jdk1.5/jre/lib/i386:/usr/local/opt/osg-0.7.0/
> jdk1.5/jre/lib/i
> 386/server:/usr/local/opt/osg-0.7.0/jdk1.5/jre/lib/i386/client:/usr/
> local/opt/os
> g-0.7.0/mysql/lib/mysql:/usr/local/opt/osg-0.7.0/berkeley-db/lib:/
> usr/local/opt/
> osg-0.7.0/expat/lib:
> 2007-08-28 15:09:38 OK:
> 22007-08-28 15:09:38 INFO: Forced agent start
> 2007-08-28 15:09:38 INFO: Starting grid_manager_monitor_agent
> Unquoted string "break" may clash with future reserved word at
> /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
> condor.pm line
> 55.
> Useless use of a constant in void context at
> /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
> condor.pm line
> 55.
> // should probably be written as "" at
> /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
> condor.pm line
> 393.
> Can't locate object method "new" via package
> "Globus::GRAM::JobManager::condor"
> at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
> condor.pm
> line 29.
> 2007-08-28 15:09:38 INFO: Started grid_manager_monitor_agent as
> /tmp/grid_manager_monitor_agent.usatlas1.5908.1000, pid 5930
> 2007-08-28 15:09:39 ERROR: 8: grid_manager_monitor_agent (pid 5930)
> exited with
> a 255 result (65280).
> -----
>
> Then I started it again, and then it ran for a while, and produced
> some output:
>
> -----
> [hs@ouhep5 hs]$ cat /tmp/job_status
> 1188332060 1188332060
> https://osgitb1.nhn.ou.edu:63015/7990/1188332042/ 32
> GRIDMONEOF
> -----
>
> But then it crashed again with the same error.
>
> And when I submitted the monitor from osgitb1, it additionally
> gave me this error:
>
> -----
> Can't locate object method "new" via package
> "Globus::GRAM::JobManager::condor"
> at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
> condor.pm
> line 29.
> -----
>
> But the /tmp/job_status on osgitb1 also looked the same as on ouhep5,
> so it picked up the job, too.
>
> Does that tell you anything?
>
> Thanks a lot,
>
> Horst
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 21:32 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.5k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/28/2007 at 21:32:14 with the following information:

FootPrints Ticket Description:
If the grid monitor is regularly crashing with errors like these,
that would explain the behavior that was reported. When the grid
monitor fails, the Condor gridmnager will restart up to 10
jobmanagers, which will increase the load on the CE.

The error to investigate is this one:
Can't locate object method "new" via package
"Globus::GRAM::JobManager::condor"
at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
condor.pm
line 29.

This may be related to a problem I saw earlier this month. The grid
monitor was failing at LTU because it was using the system-installed
perl and the standard perl library path, which was missing a critical
module. Globus was using an OSG-installed perl with its own library
path, which had the module.

-- Jaime

On Aug 28, 2007, at 3:41 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 21:50 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.2k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/28/2007 at 21:50:09 with the following information:

FootPrints Ticket Description:
Hi Jaime,

> The error to investigate is this one:
> Can't locate object method "new" via package
> "Globus::GRAM::JobManager::condor"
> at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
> condor.pm
> line 29.

so how do we investigate this error? Anything we can do to help?

This error only happenend when I submitted the globusrun grid manager
from osgitb1. Do you think the error when submitting from ouhep5
was the same, but wasn't printed because of the older osg version
(0.6 vs. 0.7)?

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 18:50:41 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Your previous email said the error was printed in both cases (grid
monitor job submitted from osgitb1 and ouhep5).

-- Jaime

On Aug 28, 2007, at 4:56 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 08/28/2007 at 21:50:09 with the following information:
>
> FootPrints Ticket Description:
> Hi Jaime,
>
>> The error to investigate is this one:
>> Can't locate object method "new" via package
>> "Globus::GRAM::JobManager::condor"
>> at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
>> condor.pm
>> line 29.
>
> so how do we investigate this error? Anything we can do to help?
>
> This error only happenend when I submitted the globusrun grid manager
> from osgitb1. Do you think the error when submitting from ouhep5
> was the same, but wasn't printed because of the older osg version
> (0.6 vs. 0.7)?
>
> Thanks a lot,
>
> Horst
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 29 Aug 2007 00:02 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 980b
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/29/2007 at 00:02:07 with the following information:

FootPrints Ticket Description:
Your previous email said the error was printed in both cases (grid
monitor job submitted from osgitb1 and ouhep5).

-- Jaime

On Aug 28, 2007, at 4:56 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 29 Aug 2007 00:47 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 2.2k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/29/2007 at 00:47:07 with the following information:

FootPrints Ticket Description:
Hi Jaime,

> Your previous email said the error was printed in both cases (grid
> monitor job submitted from osgitb1 and ouhep5).

no it didn't. =)

This is what it said:

==========
After I started a sleep job from ouhep5 on osgitb1 via Condor-G,
I ran the monitor again, and this time it crashed:

-----
[hs@ouhep5 hs]$ globusrun -s -r osgitb1.nhn.ou.edu/jobmanager-fork ...

[...]

2007-08-28 15:09:39 ERROR: 8: grid_manager_monitor_agent (pid 5930) exited with
a 255 result (65280).
-----

Then I started it again, and then it ran for a while, and produced
some output:

-----
[hs@ouhep5 hs]$ cat /tmp/job_status
1188332060 1188332060
https://osgitb1.nhn.ou.edu:63015/7990/1188332042/ 32
GRIDMONEOF
-----

But then it crashed again with the same error.

And when I submitted the monitor from osgitb1, it additionally
gave me this error:

-----
Can't locate object method "new" via package
"Globus::GRAM::JobManager::condor"
at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/condor.pm
line 29.
-----
==========

Note the 'additionally'. :)

So the "exited with a 255 result (65280)" error happened when I submitted
the grid manager from both machines, but the "Can't locate object method"
error only happened when I submitted the grid manager from osgitb1,
i.e., the new osg-0.7.0 version.

Unfortunately I don't completely understand the command I ran,
so I'm not sure which parts of it were run on the client vs. the gatekeeper,
so I'm not sure how else to debug this. Any hints greatly appreciated. :^)

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 20:24:12 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Download (untitled) / with headers
text/plain 3.2k
Look two lines above the 'ERROR: 8: grid_manager_monitor_agent (pid
5930) exited with
a 255 result (65280).' in your previous email. You'll see the same
'additional' error message. So it appears to happen if you submit the
grid monitor job from either machine (ouhep5 or osgitb1).

-- Jaime

On Aug 28, 2007, at 7:51 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 08/29/2007 at 00:47:07 with the following information:
>
> FootPrints Ticket Description:
> Hi Jaime,
>
>> Your previous email said the error was printed in both cases (grid
>> monitor job submitted from osgitb1 and ouhep5).
>
> no it didn't. =)
>
> This is what it said:
>
> ==========
> After I started a sleep job from ouhep5 on osgitb1 via Condor-G,
> I ran the monitor again, and this time it crashed:
>
> -----
> [hs@ouhep5 hs]$ globusrun -s -r osgitb1.nhn.ou.edu/jobmanager-fork ...
>
> [...]
>
> 2007-08-28 15:09:39 ERROR: 8: grid_manager_monitor_agent (pid 5930)
> exited with
> a 255 result (65280).
> -----
>
> Then I started it again, and then it ran for a while, and produced
> some output:
>
> -----
> [hs@ouhep5 hs]$ cat /tmp/job_status
> 1188332060 1188332060
> https://osgitb1.nhn.ou.edu:63015/7990/1188332042/ 32
> GRIDMONEOF
> -----
>
> But then it crashed again with the same error.
>
> And when I submitted the monitor from osgitb1, it additionally
> gave me this error:
>
> -----
> Can't locate object method "new" via package
> "Globus::GRAM::JobManager::condor"
> at /usr/local/opt/osg-0.7.0/globus/lib/perl/Globus/GRAM/JobManager/
> condor.pm
> line 29.
> -----
> ==========
>
> Note the 'additionally'. :)
>
> So the "exited with a 255 result (65280)" error happened when I
> submitted
> the grid manager from both machines, but the "Can't locate object
> method"
> error only happened when I submitted the grid manager from osgitb1,
> i.e., the new osg-0.7.0 version.
>
> Unfortunately I don't completely understand the command I ran,
> so I'm not sure which parts of it were run on the client vs. the
> gatekeeper,
> so I'm not sure how else to debug this. Any hints greatly
> appreciated. :^)
>
> Thanks a lot,
>
> Horst
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 29 Aug 2007 01:44 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.1k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/29/2007 at 01:44:07 with the following information:

FootPrints Ticket Description:
Look two lines above the 'ERROR: 8: grid_manager_monitor_agent (pid
5930) exited with
a 255 result (65280).' in your previous email. You'll see the same
'additional' error message. So it appears to happen if you submit the
grid monitor job from either machine (ouhep5 or osgitb1).

-- Jaime

On Aug 28, 2007, at 7:51 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 29 Aug 2007 01:59 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/29/2007 at 01:59:07 with the following information:

FootPrints Ticket Description:
Hi Jaime,

okay, so I'm blind. =)

So I guess the two errors were just swapped? Presumably because of
intermixing of stdout and stderr?

So now that you showed me the light, what else can I do to help
track this down and fix it, so that we can run GridEx jobs again?

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 28 Aug 2007 22:33:32 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Download (untitled) / with headers
text/plain 1.7k
Would it be possible for me to get a login on the CE? I think that'd
be the most efficient way to proceed.

-- Jaime

On Aug 28, 2007, at 9:21 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 08/29/2007 at 01:59:07 with the following information:
>
> FootPrints Ticket Description:
> Hi Jaime,
>
> okay, so I'm blind. =)
>
> So I guess the two errors were just swapped? Presumably because of
> intermixing of stdout and stderr?
>
> So now that you showed me the light, what else can I do to help
> track this down and fix it, so that we can run GridEx jobs again?
>
> Thanks a lot,
>
> Horst
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 29 Aug 2007 03:47 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 972b
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/29/2007 at 03:47:08 with the following information:

FootPrints Ticket Description:
Would it be possible for me to get a login on the CE? I think that'd
be the most efficient way to proceed.

-- Jaime

On Aug 28, 2007, at 9:21 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 29 Aug 2007 15:37 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 927b
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/29/2007 at 15:37:20 with the following information:

FootPrints Ticket Description:
Quick update: The latest issue Horst has reported about unexpected interaction
between his existing Condor and RSV's condor-devel has been taken offline, on a
separate thread. -Arvind

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 29 Aug 2007 15:23 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 5.5k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/29/2007 at 15:23:08 with the following information:

FootPrints Ticket Description:
Hi Jaime,

> Would it be possible for me to get a login on the CE? I think that'd
> be the most efficient way to proceed.

sure, if you give me your rsa public key, I can add that to the
usatlas1 user on osgitb1.nhn.ou.edu, and then you can play around locally.

By the way, I think there's still something fishy with Condor-Devel/OSG-RSV
setup on osg-0.7.0, even after the last update. Occasionally, just out of
the blue, our regular Condor installation (6.8.4) loses track of the
real jobs with 'condor_q -g', and picks up the RSV jobs instead.

These next two commands were issued back to back, with no delay in between,
nor anything else:

=====
[hs@osgitb1 ~]$ condor_q -g

-- Schedd: osgitb1.nhn.ou.edu : <129.15.31.41:63325>
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
3748.0 usatlas1 8/29 09:38 0+00:00:00 I 0 9.8 env

1 jobs; 1 idle, 0 running, 0 held
[hs@osgitb1 ~]$ condor_q -g

-- Schedd: osgitb1.nhn.ou.edu : <129.15.31.41:57699>
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
56.0 mis 8/28 18:24 0+08:03:03 R 0 9.8 probe_wrapper.pl /
57.0 mis 8/28 18:24 0+00:23:43 I 0 9.8 probe_wrapper.pl /
58.0 mis 8/28 18:24 0+00:23:46 I 0 9.8 probe_wrapper.pl /
59.0 mis 8/28 18:24 0+00:23:47 I 0 9.8 probe_wrapper.pl /
60.0 mis 8/28 18:24 0+00:23:48 I 0 9.8 probe_wrapper.pl /
61.0 mis 8/28 18:24 0+00:23:49 I 0 9.8 probe_wrapper.pl /
62.0 mis 8/28 18:24 0+00:23:49 I 0 9.8 probe_wrapper.pl /
63.0 mis 8/28 18:24 0+00:23:45 I 0 9.8 probe_wrapper.pl /
64.0 mis 8/28 18:24 0+12:04:29 R 0 9.8 probe_wrapper.pl /
65.0 mis 8/28 18:24 0+00:23:45 I 0 9.8 probe_wrapper.pl /
66.0 mis 8/28 18:24 0+00:23:47 I 0 9.8 probe_wrapper.pl /
67.0 mis 8/28 18:24 0+00:23:48 I 0 9.8 probe_wrapper.pl /
68.0 mis 8/28 18:24 0+00:23:51 I 0 9.8 probe_wrapper.pl /
69.0 mis 8/28 18:24 0+00:39:45 I 0 9.8 gratia-script-cons
70.0 mis 8/28 18:24 0+00:40:15 I 0 9.8 sample-consumer /u

15 jobs; 13 idle, 2 running, 0 held
=====

I didn't change CONDOR_LOCATION or CONDOR_CONFIG or anything,
so it should've never picked up the RSV jobs like this.
It just suddenly started talking to the condor-devel daemon?

Then later I tried it again, and it still picked up the wrong ones:

=====
[hs@osgitb1 ~]$ condor_q -g

-- Schedd: osgitb1.nhn.ou.edu : <129.15.31.41:57699>
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
56.0 mis 8/28 18:24 0+08:14:43 R 0 9.8 probe_wrapper.pl /
57.0 mis 8/28 18:24 0+00:23:43 I 0 9.8 probe_wrapper.pl /
58.0 mis 8/28 18:24 0+00:23:46 I 0 9.8 probe_wrapper.pl /
59.0 mis 8/28 18:24 0+00:23:47 I 0 9.8 probe_wrapper.pl /
60.0 mis 8/28 18:24 0+00:23:48 I 0 9.8 probe_wrapper.pl /
61.0 mis 8/28 18:24 0+00:23:49 I 0 9.8 probe_wrapper.pl /
62.0 mis 8/28 18:24 0+00:23:49 I 0 9.8 probe_wrapper.pl /
63.0 mis 8/28 18:24 0+00:23:45 I 0 9.8 probe_wrapper.pl /
64.0 mis 8/28 18:24 0+12:19:03 R 0 9.8 probe_wrapper.pl /
65.0 mis 8/28 18:24 0+00:23:45 I 0 9.8 probe_wrapper.pl /
66.0 mis 8/28 18:24 0+00:23:47 I 0 9.8 probe_wrapper.pl /
67.0 mis 8/28 18:24 0+00:23:48 I 0 9.8 probe_wrapper.pl /
68.0 mis 8/28 18:24 0+00:23:51 I 0 9.8 probe_wrapper.pl /
69.0 mis 8/28 18:24 0+00:40:55 I 0 9.8 gratia-script-cons
70.0 mis 8/28 18:24 0+00:41:32 R 0 9.8 sample-consumer /u

15 jobs; 12 idle, 3 running, 0 held
=====

But then I submitted another test job to osgitb1/jobmanager-condor,
and then all of a sudden it picked up the right one again:

=====
[hs@osgitb1 ~]$ condor_q -g

-- Schedd: osgitb1.nhn.ou.edu : <129.15.31.41:63325>
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
3749.0 usatlas1 8/29 09:57 0+00:00:00 I 0 9.8 env

1 jobs; 1 idle, 0 running, 0 held
=====

How can that be?

I'm not sure if this has anything to do with our grid manager problem,
but it certainly shouldn't happen, right?

How do I know which condor daemon condor_q or condor_status talk to?
I have no *CONDOR* env vars set, and it picks up the binaries in
/usr/local/bin/, which are soft links to /usr/local/condor/bin/,
which are the regular 6.8.4 installation, so there should be no way
to suddenly talk to the condor-devel daemons, right?

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 29 Aug 2007 11:27:59 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Download (untitled) / with headers
text/plain 7.6k
Here is my public key:

ssh-dss AAAAB3NzaC1kc3MAAAEBAKFQpOeVNX/16RfGAALXQ+pwosdlcMzBUtY0Dn6
+YgVgJXq/9mfCdGXuj5OzK2wjO5l1O71drcOjtYu9CvD0rrtzKp5B5xWZU/
pd4f2d11waSIgj0trEGHAMG+VQ69wjBwjR81YPOkI2HcEqbEGGWFS69iIu3yt/X/
09wxwdOFpEmUKnjxCLD2PS/VlXydgLjdXq6+nUUz/
RFHv2Jbtbff8nSGW6SFdP424YwFazClMYhG8kKAtfSm0uL6bhzFs1ysOhRqHIYmu7w595brI
BHsqdeZXwPlwOc4roLH9W99q7Zzozt9v+OTwNs9RWBa5/qnzZOA1mqms5GQySoqM/
+HsAAAAVAMhR5pJ+m9v/
O7RYqbbe0v2fqS2BAAABABIspAFuOztfIXFh6o2C0vwbVNo10rbTC7bcvzAHu5C/
SoemSqfiKSG9UdTWqM6u8Hw8k1StVK1GGcoh
+wfUksT1r6PCykTC6uO5FqUIYWEVT8ILf0e/
+DjcuVSUw4jpGhs3hu28onqdKlZHqrnOc4q7ZjZ8+j8aGXnm/
xosrtWz7vhJV15TtKLdpc3hDcaBgdK95JYmBPDhrLRKExRHoOh0Emg07wzfxpr/
ECzXFiKf6DgO7LkeswgknXTrPahRbN2GUNmJKDWq8jVhvRASNendHaNmwjGcZnxBvmpuzuDG
/YHcz6BCCqGZlWQakk3NiDnGX3je0mdWkeM0tzK1EYIAAAEAZUKgsTxjr
+hrDwiPPQ5NzTO+3/
IgQYlQs3a6x2GsteSI8PDHbT6TKUPXu0wlgIdDhRzezJOIOPStU8geewAdFIzh1aI7E96L3R
bvH+xNWk6Q7kGhcCWcqk7IDjL59YLn/JIQnq/5FQGgWgeUzP83jnhIJ/
SqwAnPPWBu0fLZ1UIOXcHDvAQcqondSKB9bEkz0tM44Be/
q3R8KkyZi1DOX4TBXtodoCFenLQGkaA/
NIJbyCjajzYhjuaMC40CHf4W1pagPhzxtT0uDWiRcUrG43EftAVFy27mnq7IrrbXYU
+uGfBlPSOAWL0XQlyNE93aAxj8lgnHa3v0e5qyDjZvTw== jfrey@nostos.cs.wisc.edu

Could you also add my X509 DN to the grid-mapfile for user usatlas1:
/DC=org/DC=doegrids/OU=People/CN=James Frey 259919

-- Jaime

On Aug 29, 2007, at 11:22 AM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 08/29/2007 at 15:23:08 with the following information:
>
> FootPrints Ticket Description:
> Hi Jaime,
>
>> Would it be possible for me to get a login on the CE? I think that'd
>> be the most efficient way to proceed.
>
> sure, if you give me your rsa public key, I can add that to the
> usatlas1 user on osgitb1.nhn.ou.edu, and then you can play around
> locally.
>
> By the way, I think there's still something fishy with Condor-Devel/
> OSG-RSV
> setup on osg-0.7.0, even after the last update. Occasionally, just
> out of
> the blue, our regular Condor installation (6.8.4) loses track of the
> real jobs with 'condor_q -g', and picks up the RSV jobs instead.
>
> These next two commands were issued back to back, with no delay in
> between,
> nor anything else:
>
> =====
> [hs@osgitb1 ~]$ condor_q -g
>
> -- Schedd: osgitb1.nhn.ou.edu : <129.15.31.41:63325>
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> 3748.0 usatlas1 8/29 09:38 0+00:00:00 I 0 9.8 env
>
> 1 jobs; 1 idle, 0 running, 0 held
> [hs@osgitb1 ~]$ condor_q -g
>
> -- Schedd: osgitb1.nhn.ou.edu : <129.15.31.41:57699>
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> 56.0 mis 8/28 18:24 0+08:03:03 R 0 9.8
> probe_wrapper.pl /
> 57.0 mis 8/28 18:24 0+00:23:43 I 0 9.8
> probe_wrapper.pl /
> 58.0 mis 8/28 18:24 0+00:23:46 I 0 9.8
> probe_wrapper.pl /
> 59.0 mis 8/28 18:24 0+00:23:47 I 0 9.8
> probe_wrapper.pl /
> 60.0 mis 8/28 18:24 0+00:23:48 I 0 9.8
> probe_wrapper.pl /
> 61.0 mis 8/28 18:24 0+00:23:49 I 0 9.8
> probe_wrapper.pl /
> 62.0 mis 8/28 18:24 0+00:23:49 I 0 9.8
> probe_wrapper.pl /
> 63.0 mis 8/28 18:24 0+00:23:45 I 0 9.8
> probe_wrapper.pl /
> 64.0 mis 8/28 18:24 0+12:04:29 R 0 9.8
> probe_wrapper.pl /
> 65.0 mis 8/28 18:24 0+00:23:45 I 0 9.8
> probe_wrapper.pl /
> 66.0 mis 8/28 18:24 0+00:23:47 I 0 9.8
> probe_wrapper.pl /
> 67.0 mis 8/28 18:24 0+00:23:48 I 0 9.8
> probe_wrapper.pl /
> 68.0 mis 8/28 18:24 0+00:23:51 I 0 9.8
> probe_wrapper.pl /
> 69.0 mis 8/28 18:24 0+00:39:45 I 0 9.8 gratia-
> script-cons
> 70.0 mis 8/28 18:24 0+00:40:15 I 0 9.8 sample-
> consumer /u
>
> 15 jobs; 13 idle, 2 running, 0 held
> =====
>
> I didn't change CONDOR_LOCATION or CONDOR_CONFIG or anything,
> so it should've never picked up the RSV jobs like this.
> It just suddenly started talking to the condor-devel daemon?
>
> Then later I tried it again, and it still picked up the wrong ones:
>
> =====
> [hs@osgitb1 ~]$ condor_q -g
>
> -- Schedd: osgitb1.nhn.ou.edu : <129.15.31.41:57699>
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> 56.0 mis 8/28 18:24 0+08:14:43 R 0 9.8
> probe_wrapper.pl /
> 57.0 mis 8/28 18:24 0+00:23:43 I 0 9.8
> probe_wrapper.pl /
> 58.0 mis 8/28 18:24 0+00:23:46 I 0 9.8
> probe_wrapper.pl /
> 59.0 mis 8/28 18:24 0+00:23:47 I 0 9.8
> probe_wrapper.pl /
> 60.0 mis 8/28 18:24 0+00:23:48 I 0 9.8
> probe_wrapper.pl /
> 61.0 mis 8/28 18:24 0+00:23:49 I 0 9.8
> probe_wrapper.pl /
> 62.0 mis 8/28 18:24 0+00:23:49 I 0 9.8
> probe_wrapper.pl /
> 63.0 mis 8/28 18:24 0+00:23:45 I 0 9.8
> probe_wrapper.pl /
> 64.0 mis 8/28 18:24 0+12:19:03 R 0 9.8
> probe_wrapper.pl /
> 65.0 mis 8/28 18:24 0+00:23:45 I 0 9.8
> probe_wrapper.pl /
> 66.0 mis 8/28 18:24 0+00:23:47 I 0 9.8
> probe_wrapper.pl /
> 67.0 mis 8/28 18:24 0+00:23:48 I 0 9.8
> probe_wrapper.pl /
> 68.0 mis 8/28 18:24 0+00:23:51 I 0 9.8
> probe_wrapper.pl /
> 69.0 mis 8/28 18:24 0+00:40:55 I 0 9.8 gratia-
> script-cons
> 70.0 mis 8/28 18:24 0+00:41:32 R 0 9.8 sample-
> consumer /u
>
> 15 jobs; 12 idle, 3 running, 0 held
> =====
>
> But then I submitted another test job to osgitb1/jobmanager-condor,
> and then all of a sudden it picked up the right one again:
>
> =====
> [hs@osgitb1 ~]$ condor_q -g
>
> -- Schedd: osgitb1.nhn.ou.edu : <129.15.31.41:63325>
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> 3749.0 usatlas1 8/29 09:57 0+00:00:00 I 0 9.8 env
>
> 1 jobs; 1 idle, 0 running, 0 held
> =====
>
> How can that be?
>
> I'm not sure if this has anything to do with our grid manager problem,
> but it certainly shouldn't happen, right?
>
> How do I know which condor daemon condor_q or condor_status talk to?
> I have no *CONDOR* env vars set, and it picks up the binaries in
> /usr/local/bin/, which are soft links to /usr/local/condor/bin/,
> which are the regular 6.8.4 installation, so there should be no way
> to suddenly talk to the condor-devel daemons, right?
>
> Thanks a lot,
>
> Horst
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 29 Aug 2007 16:32 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 2.1k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/29/2007 at 16:32:08 with the following information:

FootPrints Ticket Description:
Here is my public key:

ssh-dss AAAAB3NzaC1kc3MAAAEBAKFQpOeVNX/16RfGAALXQ+pwosdlcMzBUtY0Dn6
+YgVgJXq/9mfCdGXuj5OzK2wjO5l1O71drcOjtYu9CvD0rrtzKp5B5xWZU/
pd4f2d11waSIgj0trEGHAMG+VQ69wjBwjR81YPOkI2HcEqbEGGWFS69iIu3yt/X/
09wxwdOFpEmUKnjxCLD2PS/VlXydgLjdXq6+nUUz/
RFHv2Jbtbff8nSGW6SFdP424YwFazClMYhG8kKAtfSm0uL6bhzFs1ysOhRqHIYmu7w595brI
BHsqdeZXwPlwOc4roLH9W99q7Zzozt9v+OTwNs9RWBa5/qnzZOA1mqms5GQySoqM/
+HsAAAAVAMhR5pJ+m9v/
O7RYqbbe0v2fqS2BAAABABIspAFuOztfIXFh6o2C0vwbVNo10rbTC7bcvzAHu5C/
SoemSqfiKSG9UdTWqM6u8Hw8k1StVK1GGcoh
+wfUksT1r6PCykTC6uO5FqUIYWEVT8ILf0e/
+DjcuVSUw4jpGhs3hu28onqdKlZHqrnOc4q7ZjZ8+j8aGXnm/
xosrtWz7vhJV15TtKLdpc3hDcaBgdK95JYmBPDhrLRKExRHoOh0Emg07wzfxpr/
ECzXFiKf6DgO7LkeswgknXTrPahRbN2GUNmJKDWq8jVhvRASNendHaNmwjGcZnxBvmpuzuDG
/YHcz6BCCqGZlWQakk3NiDnGX3je0mdWkeM0tzK1EYIAAAEAZUKgsTxjr
+hrDwiPPQ5NzTO+3/
IgQYlQs3a6x2GsteSI8PDHbT6TKUPXu0wlgIdDhRzezJOIOPStU8geewAdFIzh1aI7E96L3R
bvH+xNWk6Q7kGhcCWcqk7IDjL59YLn/JIQnq/5FQGgWgeUzP83jnhIJ/
SqwAnPPWBu0fLZ1UIOXcHDvAQcqondSKB9bEkz0tM44Be/
q3R8KkyZi1DOX4TBXtodoCFenLQGkaA/
NIJbyCjajzYhjuaMC40CHf4W1pagPhzxtT0uDWiRcUrG43EftAVFy27mnq7IrrbXYU
+uGfBlPSOAWL0XQlyNE93aAxj8lgnHa3v0e5qyDjZvTw== jfrey@nostos.cs.wisc.edu

Could you also add my X509 DN to the grid-mapfile for user usatlas1:
/DC=org/DC=doegrids/OU=People/CN=James Frey 259919

-- Jaime

On Aug 29, 2007, at 11:22 AM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Fri, 31 Aug 2007 17:32:16 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Download (untitled) / with headers
text/plain 3.8k
I've figured out part of what's going on, although I don't fully
understand it yet. When the Globus job-manager calls out to its perl
modules, the VDT sneaks in a ". <vdt>/setup.sh" before perl is
launched. This sets a bunch of environment variables. The Grid
Monitor calls in the job-manager's perl modules directly, missing the
". <vdt>/setup.sh". Without the environment variables that setup.sh
sets, the perl modules fail with this error:

Can't locate object method "new" via package
"Globus::GRAM::JobManager::condor" at /usr/local/opt/osg-0.7.0/globus/
lib/perl/Globus/GRAM/JobManager/condor.pm line 29.

Specifically, LD_LIBRARY_PATH and PERL5LIB are the environment
variables that determine the success or failure of the job-manager
perl modules. Both need to be set appropriately for the code to work
correctly. For LD_LIBRARY_PATH, it needs to contain the expat library
directory installed by the VDT.

If there are no gram job state files, then the failing codepath isn't
hit, which is why the Grid Monitor may run for a while without dying.

-- Jaime

On Aug 29, 2007, at 11:35 AM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 08/29/2007 at 16:32:08 with the following information:
>
> FootPrints Ticket Description:
> Here is my public key:
>
> ssh-dss AAAAB3NzaC1kc3MAAAEBAKFQpOeVNX/16RfGAALXQ+pwosdlcMzBUtY0Dn6
> +YgVgJXq/9mfCdGXuj5OzK2wjO5l1O71drcOjtYu9CvD0rrtzKp5B5xWZU/
> pd4f2d11waSIgj0trEGHAMG+VQ69wjBwjR81YPOkI2HcEqbEGGWFS69iIu3yt/X/
> 09wxwdOFpEmUKnjxCLD2PS/VlXydgLjdXq6+nUUz/
> RFHv2Jbtbff8nSGW6SFdP424YwFazClMYhG8kKAtfSm0uL6bhzFs1ysOhRqHIYmu7w595b
> rI
> BHsqdeZXwPlwOc4roLH9W99q7Zzozt9v+OTwNs9RWBa5/qnzZOA1mqms5GQySoqM/
> +HsAAAAVAMhR5pJ+m9v/
> O7RYqbbe0v2fqS2BAAABABIspAFuOztfIXFh6o2C0vwbVNo10rbTC7bcvzAHu5C/
> SoemSqfiKSG9UdTWqM6u8Hw8k1StVK1GGcoh
> +wfUksT1r6PCykTC6uO5FqUIYWEVT8ILf0e/
> +DjcuVSUw4jpGhs3hu28onqdKlZHqrnOc4q7ZjZ8+j8aGXnm/
> xosrtWz7vhJV15TtKLdpc3hDcaBgdK95JYmBPDhrLRKExRHoOh0Emg07wzfxpr/
> ECzXFiKf6DgO7LkeswgknXTrPahRbN2GUNmJKDWq8jVhvRASNendHaNmwjGcZnxBvmpuzu
> DG
> /YHcz6BCCqGZlWQakk3NiDnGX3je0mdWkeM0tzK1EYIAAAEAZUKgsTxjr
> +hrDwiPPQ5NzTO+3/
> IgQYlQs3a6x2GsteSI8PDHbT6TKUPXu0wlgIdDhRzezJOIOPStU8geewAdFIzh1aI7E96L
> 3R
> bvH+xNWk6Q7kGhcCWcqk7IDjL59YLn/JIQnq/5FQGgWgeUzP83jnhIJ/
> SqwAnPPWBu0fLZ1UIOXcHDvAQcqondSKB9bEkz0tM44Be/
> q3R8KkyZi1DOX4TBXtodoCFenLQGkaA/
> NIJbyCjajzYhjuaMC40CHf4W1pagPhzxtT0uDWiRcUrG43EftAVFy27mnq7IrrbXYU
> +uGfBlPSOAWL0XQlyNE93aAxj8lgnHa3v0e5qyDjZvTw==
> jfrey@nostos.cs.wisc.edu
>
> Could you also add my X509 DN to the grid-mapfile for user usatlas1:
> /DC=org/DC=doegrids/OU=People/CN=James Frey 259919
>
> -- Jaime
>
> On Aug 29, 2007, at 11:22 AM, osg@tick-indy.globalnoc.iu.edu via RT
> wrote:
>
>> [Duplicate message snipped]
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Fri, 31 Aug 2007 22:35 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.8k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 08/31/2007 at 22:35:07 with the following information:

FootPrints Ticket Description:
I've figured out part of what's going on, although I don't fully
understand it yet. When the Globus job-manager calls out to its perl
modules, the VDT sneaks in a ". <vdt>/setup.sh" before perl is
launched. This sets a bunch of environment variables. The Grid
Monitor calls in the job-manager's perl modules directly, missing the
". <vdt>/setup.sh". Without the environment variables that setup.sh
sets, the perl modules fail with this error:

Can't locate object method "new" via package
"Globus::GRAM::JobManager::condor" at /usr/local/opt/osg-0.7.0/globus/
lib/perl/Globus/GRAM/JobManager/condor.pm line 29.

Specifically, LD_LIBRARY_PATH and PERL5LIB are the environment
variables that determine the success or failure of the job-manager
perl modules. Both need to be set appropriately for the code to work
correctly. For LD_LIBRARY_PATH, it needs to contain the expat library
directory installed by the VDT.

If there are no gram job state files, then the failing codepath isn't
hit, which is why the Grid Monitor may run for a while without dying.

-- Jaime

On Aug 29, 2007, at 11:35 AM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Sat, 01 Sep 2007 06:56 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.4k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/01/2007 at 06:56:08 with the following information:

FootPrints Ticket Description:
Hi Jaime,

interesting; well, that's progress.
So how can we fix this, so that the Grid Monitor will work properly
without dying?

And why does this not happen anywhere else? Is it related to the fact that
RHEL5 is using a newer perl version, which may not have the required
methods?

Wait -- when you setup OSG on osgitb1, you still get perl 5.8.8,
even though the OSG comes with 5.8.0, right? So it looks like there's
some mismatch even in the regular OSG setup, not just the Grid Monitor?

I used 'perl -version' and 'perl -V' both before and after the OSG setup,
but I don't really understand all the output, so you may want to
try that yourself and see what you get.

THanks a lot,
good night,
and a good long weekend,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 04 Sep 2007 15:59:04 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Download (untitled) / with headers
text/plain 2.8k
Starting with version 4.0.5, Globus requires perl's XML::Parser
module. This in turn requires libexpat. The VDT provides both and
ensures they are found when the job-manager calls out to its perl
modules. The grid monitor doesn't know about the VDT-installed
libraries and uses the system libraries. If XML::Parser and libexpat
are installed as part of the system, everything works. If they
aren't, the grid monitor will fail.

Alain Roy and I have developed a patch to $GL/lib/perl/Globus/GRAM/
JobManager/fork.pm to point the grid monitor (and any other fork
jobmanager jobs) to the VDT-installed libraries. I've attached the
patch. Can you try applying it on osgitb1.nhn.ou.edu? Then I can test
whether the patch makes the grid monitor work.

-- Jaime

On Sep 1, 2007, at 1:59 AM, osg@tick-indy.globalnoc.iu.edu via RT wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 09/01/2007 at 06:56:08 with the following information:
>
> FootPrints Ticket Description:
> Hi Jaime,
>
> interesting; well, that's progress.
> So how can we fix this, so that the Grid Monitor will work properly
> without dying?
>
> And why does this not happen anywhere else? Is it related to the
> fact that
> RHEL5 is using a newer perl version, which may not have the required
> methods?
>
> Wait -- when you setup OSG on osgitb1, you still get perl 5.8.8,
> even though the OSG comes with 5.8.0, right? So it looks like there's
> some mismatch even in the regular OSG setup, not just the Grid
> Monitor?
>
> I used 'perl -version' and 'perl -V' both before and after the OSG
> setup,
> but I don't really understand all the output, so you may want to
> try that yourself and see what you get.
>
> THanks a lot,
> good night,
> and a good long weekend,
>
> Horst
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Download fork.patch
application/octet-stream 590b

Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 04 Sep 2007 21:08 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.5k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/04/2007 at 21:08:07 with the following information:

FootPrints Ticket Description:
Starting with version 4.0.5, Globus requires perl's XML::Parser
module. This in turn requires libexpat. The VDT provides both and
ensures they are found when the job-manager calls out to its perl
modules. The grid monitor doesn't know about the VDT-installed
libraries and uses the system libraries. If XML::Parser and libexpat
are installed as part of the system, everything works. If they
aren't, the grid monitor will fail.

Alain Roy and I have developed a patch to $GL/lib/perl/Globus/GRAM/
JobManager/fork.pm to point the grid monitor (and any other fork
jobmanager jobs) to the VDT-installed libraries. I've attached the
patch. Can you try applying it on osgitb1.nhn.ou.edu? Then I can test
whether the patch makes the grid monitor work.

-- Jaime

On Sep 1, 2007, at 1:59 AM, osg@tick-indy.globalnoc.iu.edu via RT wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Download fork.patch
text/plain 571b
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 04 Sep 2007 23:20 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 2.2k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/04/2007 at 23:20:07 with the following information:

FootPrints Ticket Description:
Hi Jaime and Alain,

thanks, I just applied the patch, and it seems to work. At least the
grid monitor isn't crashing anymore. It keeps happily running as

----
globusrun -s -r osgitb1.nhn.ou.edu/jobmanager-fork '&(executable=$(GLOBUSRUN_GASS_URL)/usr/local/condor/sbin/grid_monitor.sh)(arguments="--dest-url="#$(GLOBUSRUN_GASS_URL)#"/tmp/job_status")'
----

on ouhep5.

But now /tmp/job_status on the condor-g submit host doesn't seem to be
picking up the running jobs anymore, it's empty:

----
[hs@ouhep5 hs]$ cat /tmp/job_status
1188947412 1188947412
GRIDMONEOF
----

Whereas I have 7 condor jobs running with

----
[hs@ouhep5 hs]$ globus-job-run osgitb1/jobmanager-condor -np 7 /bin/sleep 150
----

which are running fine:

----
[hs@osgitb1 ~]$ condor_q

-- Submitter: osgitb1.nhn.ou.edu : <129.15.31.41:63325> : osgitb1.nhn.ou.edu
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
4652.0 usatlas1 9/4 18:10 0+00:02:23 R 0 9.8 sleep 150
4652.1 usatlas1 9/4 18:10 0+00:02:21 R 0 9.8 sleep 150
4652.2 usatlas1 9/4 18:10 0+00:02:19 R 0 9.8 sleep 150
4652.3 usatlas1 9/4 18:10 0+00:02:17 R 0 9.8 sleep 150
4652.4 usatlas1 9/4 18:10 0+00:02:15 R 0 9.8 sleep 150
4652.5 usatlas1 9/4 18:10 0+00:02:13 R 0 9.8 sleep 150
4652.6 usatlas1 9/4 18:10 0+00:02:11 R 0 9.8 sleep 150

7 jobs; 0 idle, 7 running, 0 held
----

Is that expected?

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Thu, 06 Sep 2007 14:30:08 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
The patch we gave you had a bug in it. I've attached a corrected
patch. Can you apply this new patch to $GL/lib/perl/Globus/GRAM/
JobManager/fork.pm?

-- Jaime

On Sep 4, 2007, at 6:27 PM, osg@tick-indy.globalnoc.iu.edu via RT wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 09/04/2007 at 23:20:07 with the following information:
>
> FootPrints Ticket Description:
> Hi Jaime and Alain,
>
> thanks, I just applied the patch, and it seems to work. At least the
> grid monitor isn't crashing anymore. It keeps happily running as
>
> ----
> globusrun -s -r osgitb1.nhn.ou.edu/jobmanager-fork '&(executable=$
> (GLOBUSRUN_GASS_URL)/usr/local/condor/sbin/grid_monitor.sh)
> (arguments="--dest-url="#$(GLOBUSRUN_GASS_URL)#"/tmp/job_status")'
> ----
>
> on ouhep5.
>
> But now /tmp/job_status on the condor-g submit host doesn't seem to be
> picking up the running jobs anymore, it's empty:
>
> ----
> [hs@ouhep5 hs]$ cat /tmp/job_status
> 1188947412 1188947412
> GRIDMONEOF
> ----
>
> Whereas I have 7 condor jobs running with
>
> ----
> [hs@ouhep5 hs]$ globus-job-run osgitb1/jobmanager-condor -np 7 /bin/
> sleep 150
> ----
>
> which are running fine:
>
> ----
> [hs@osgitb1 ~]$ condor_q
>
> -- Submitter: osgitb1.nhn.ou.edu : <129.15.31.41:63325> :
> osgitb1.nhn.ou.edu
> ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
> 4652.0 usatlas1 9/4 18:10 0+00:02:23 R 0 9.8 sleep 150
> 4652.1 usatlas1 9/4 18:10 0+00:02:21 R 0 9.8 sleep 150
> 4652.2 usatlas1 9/4 18:10 0+00:02:19 R 0 9.8 sleep 150
> 4652.3 usatlas1 9/4 18:10 0+00:02:17 R 0 9.8 sleep 150
> 4652.4 usatlas1 9/4 18:10 0+00:02:15 R 0 9.8 sleep 150
> 4652.5 usatlas1 9/4 18:10 0+00:02:13 R 0 9.8 sleep 150
> 4652.6 usatlas1 9/4 18:10 0+00:02:11 R 0 9.8 sleep 150
>
> 7 jobs; 0 idle, 7 running, 0 held
> ----
>
> Is that expected?
>
> Thanks a lot,
>
> Horst
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Download fork.patch
application/octet-stream 633b

Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Thu, 06 Sep 2007 19:35 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1014b
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/06/2007 at 19:35:07 with the following information:

FootPrints Ticket Description:
The patch we gave you had a bug in it. I've attached a corrected
patch. Can you apply this new patch to $GL/lib/perl/Globus/GRAM/
JobManager/fork.pm?

-- Jaime

On Sep 4, 2007, at 6:27 PM, osg@tick-indy.globalnoc.iu.edu via RT wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Download fork1.patch
text/plain 613b
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Thu, 06 Sep 2007 22:50 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.1k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/06/2007 at 22:50:08 with the following information:

FootPrints Ticket Description:
Hi Jaime,

thanks, I just applied your new patch, and this seems to work now,
since running the grid monitor by hand after submitting a jobmanager-condor
job now shows up in /tmp/job_status:

1189106547 1189106547
https://osgitb1.nhn.ou.edu:63003/23116/1189105354/ 2
GRIDMONEOF

So Alan D., could you please re-enable osgitb1 in the GridEx and see if
it now behaves better?

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] SVN commit, rev 6605
To: vdt-support@cs.wisc.edu
From: roy@cs.wisc.edu
Download (untitled) / with headers
text/plain 399b
Commit comment:
We patch the fork job manager to set PERL5LIB and LD_LIBRARY_PATH so
that the Condor grid monitor (which invokes some scripts directlry,
without using our wonderful hack below) can find XML/Parser and
expat.


Changed files:
U vdt/branches/vdt-1.8/Globus-Base-Jobmanager-Common/Globus-Base-Jobmanager-Common.pacman

To generate a diff:
svn diff -c 6605 file:///p/vdt/workspace/svn
Download (untitled) / with headers
text/plain 391b
> Hi Jaime,
>
> thanks, I just applied your new patch, and this seems to work now,
> since running the grid monitor by hand after submitting a
jobmanager-condor
> job now shows up in /tmp/job_status:

This is great! Thanks for patiently helping us to debug the problem, Horst.

VDT 1.8.1 now applies this patch during installation, so I think the
problem is solved.

Thanks again,
-alain
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Fri, 07 Sep 2007 22:11 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.2k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/07/2007 at 22:11:07 with the following information:

FootPrints Ticket Description:
> Hi Jaime,
>
> thanks, I just applied your new patch, and this seems to work now,
> since running the grid monitor by hand after submitting a
jobmanager-condor
> job now shows up in /tmp/job_status:

This is great! Thanks for patiently helping us to debug the problem, Horst.

VDT 1.8.1 now applies this patch during installation, so I think the
problem is solved.

Thanks again,
-alain

--
View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?user=guest&pass=guest&id=2922>
VDT Support, vdt-support@ivdgl.org

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Sat, 08 Sep 2007 06:20 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/08/2007 at 06:20:08 with the following information:

FootPrints Ticket Description:
Hi Jaime and Alain,

well, unfortunately this didn't solve our GridEx problem. :(
GridEx jobs are running again, and still no grid monitor process,
but 30 globus-job-manager processes.

Jaime, could you have a look at osgitb1 and see if you can figure out
why the grid monitor is still not running? Or let me know where else
I can look?

THanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Mon, 10 Sep 2007 10:24:13 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Strange. Looking at the code in fork.pm again, I don't see how the
patch could have worked on your system. It references a non-existent
variable. Here's yet another revised patch. Can you apply it on
osgitb1.nhn.ou.edu? Meanwhile, I'll talk with Alain to see if there's
a difference between your fork.pm and the one we based the patch one.

-- Jaime

On Sep 8, 2007, at 1:22 AM, osg@tick-indy.globalnoc.iu.edu via RT wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 09/08/2007 at 06:20:08 with the following information:
>
> FootPrints Ticket Description:
> Hi Jaime and Alain,
>
> well, unfortunately this didn't solve our GridEx problem. :(
> GridEx jobs are running again, and still no grid monitor process,
> but 30 globus-job-manager processes.
>
> Jaime, could you have a look at osgitb1 and see if you can figure out
> why the grid monitor is still not running? Or let me know where else
> I can look?
>
> THanks a lot,
>
> Horst
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Download fork.patch
application/octet-stream 638b

Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Mon, 10 Sep 2007 15:26 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.1k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/10/2007 at 15:26:09 with the following information:

FootPrints Ticket Description:
Strange. Looking at the code in fork.pm again, I don't see how the
patch could have worked on your system. It references a non-existent
variable. Here's yet another revised patch. Can you apply it on
osgitb1.nhn.ou.edu? Meanwhile, I'll talk with Alain to see if there's
a difference between your fork.pm and the one we based the patch one.

-- Jaime

On Sep 8, 2007, at 1:22 AM, osg@tick-indy.globalnoc.iu.edu via RT wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Download fork2.patch
text/plain 618b
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Mon, 10 Sep 2007 22:38 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.1k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/10/2007 at 22:38:08 with the following information:

FootPrints Ticket Description:
Hi Jaime,

good point, I wasn't paying attention to the patch, I just applied it. =)

So I just applied your latest patch, and then I condor_rm'd all gridex jobs
and cleaned out the gass cache, just to be sure, and the load went down
to 0.5 or so, but as soon as the new jobs came in, the load went through
the roof again, and still no running grid monitor. :(

What else can we try?

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Mon, 10 Sep 2007 22:44 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.1k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/10/2007 at 22:44:09 with the following information:

FootPrints Ticket Description:
Wait -- I just looked again, and now, after 5 or 10 minutes,
the grid monitor is now running, and the load went down to 1!

So it seems like the patch did the trick after all!
Why did it take a few minutes for this new batch of gridex jobs
to start the grid monitor, though?

Anyway, I'll monitor the situation till tomorrow morning,
but looks like we solved it!

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: Re: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Mon, 10 Sep 2007 18:01:06 -0500
To: vdt-support@OPENSCIENCEGRID.ORG
From: Jaime Frey <jfrey@cs.wisc.edu>
Download (untitled) / with headers
text/plain 1.9k
Condor-G submits the real jobs in parallel with the grid monitor job.
Once the grid monitor starts reporting back to Condor-G, Condor-G
kills the jobmanagers of the real jobs.

-- Jaime

On Sep 10, 2007, at 5:46 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> When replying, type your text above this line.
> ----------------------------------------------
> This message is to let you know that Open Science Grid ticket 4004
> "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu)
> still not working" which is assigned to you, was updated on
> 09/10/2007 at 22:44:09 with the following information:
>
> FootPrints Ticket Description:
> Wait -- I just looked again, and now, after 5 or 10 minutes,
> the grid monitor is now running, and the load went down to 1!
>
> So it seems like the patch did the trick after all!
> Why did it take a few minutes for this new batch of gridex jobs
> to start the grid monitor, though?
>
> Anyway, I'll monitor the situation till tomorrow morning,
> but looks like we solved it!
>
> Thanks a lot,
>
> Horst
>
> Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
> Status: Support Agency
> Originating VO Support Center: DOSAR
> Destination VO Support Center: VDT
> Originating Ticket Number:
> Destination Ticket Number:
>
> Thank You,
> OSG Grid Operations Center
> goc@opensciencegrid.org, 317-278-9699
> info: http://www.opensciencegrid.org
> rss: http://www.grid.iu.edu/news/
>
>
> --
> View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?
> user=guest&pass=guest&id=2922>
> VDT Support, vdt-support@ivdgl.org

+--------------------------------+-----------------------------------+
| Jaime Frey | I used to be a heavy gambler. |
| jfrey@cs.wisc.edu | But now I just make mental bets. |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind. |
+--------------------------------+-----------------------------------+
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Mon, 10 Sep 2007 23:08 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/10/2007 at 23:08:08 with the following information:

FootPrints Ticket Description:
Condor-G submits the real jobs in parallel with the grid monitor job.
Once the grid monitor starts reporting back to Condor-G, Condor-G
kills the jobmanagers of the real jobs.

-- Jaime

On Sep 10, 2007, at 5:46 PM, osg@tick-indy.globalnoc.iu.edu via RT
wrote:

> [Duplicate message snipped]

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Tue, 11 Sep 2007 00:14 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/11/2007 at 00:14:08 with the following information:

FootPrints Ticket Description:
Hi Jaime,

I see. I guess I just never paid too much attention to this before,
so I didn't notice the load spike when gridex jobs started on our old
ITB gatekeeper.

We had another load spike when LIGO started their work flow
a little while ago, but it went down again, so it looks like all is well!

Thanks a lot,

Horst

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Download (untitled) / with headers
text/plain 256b
> Wait -- I just looked again, and now, after 5 or 10 minutes,
> the grid monitor is now running, and the load went down to 1!
>
> So it seems like the patch did the trick after all!

OK, so this patch is now part of VDT 1.8.1.

Thanks everyone!

-alain
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 12 Sep 2007 19:35 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 1.1k
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/12/2007 at 19:35:08 with the following information:

FootPrints Ticket Description:
> Wait -- I just looked again, and now, after 5 or 10 minutes,
> the grid monitor is now running, and the load went down to 1!
>
> So it seems like the patch did the trick after all!

OK, so this patch is now part of VDT 1.8.1.

Thanks everyone!

-alain

--
View ticket at <http://vdt.cs.wisc.edu/rt/Ticket/Display.html?user=guest&pass=guest&id=2922>
VDT Support, vdt-support@ivdgl.org

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Support Agency
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/
Subject: [vdt-support #2922] Open Science Grid: grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not worki... ISSUE=4004 PROJ=71
Date: Wed, 12 Sep 2007 19:44 +0000
To: vdt-support@OPENSCIENCEGRID.ORG
From: Open Science Grid FootPrints <osg@tick-indy.globalnoc.iu.edu>
Download (untitled) / with headers
text/plain 827b
When replying, type your text above this line.
----------------------------------------------
This message is to let you know that Open Science Grid ticket 4004 "grid monitor for GridEx jobs on OUHEP_ITB (osgitb1.nhn.ou.edu) still not working" which is assigned to you, was updated on 09/12/2007 at 19:44:35 with the following information:

FootPrints Ticket Description:
Great!

I am closing this GOC ticket. Alain, please close corresponding VDT ticket.

Arvind

Assignees: Operations Workgroup, Arvind Gopu, OSG Support Centers, VDT
Status: Closed
Originating VO Support Center: DOSAR
Destination VO Support Center: VDT
Originating Ticket Number:
Destination Ticket Number:

Thank You,
OSG Grid Operations Center
goc@opensciencegrid.org, 317-278-9699
info: http://www.opensciencegrid.org
rss: http://www.grid.iu.edu/news/