|
|
| # | Thu Jan 24 11:42:52 2008 | wbetts@bnl.gov - Ticket created | [Reply] | |||||||||
Hi Scott et al., I went through the RSV installation instructions from the OSG CE installation guide, but something seems to have gone amiss. Condor-devel got installed and is running (I think), but there is probably some sort of environment conflict going on with the existing condor installation on the gatekeeper (v. 6.8.4). The RSV jobs are winding up in that condor queue, rather than the condor-devel queue, if I'm interpreting things correctly. Can we work on figuring out what went wrong and/or how to fix it? Thank you, Wayne Betts |
||||||||||||
| # | Thu Jan 24 12:46:02 2008 | roy - Given to kronenfe | ||
| # | Thu Jan 24 12:46:14 2008 | roy - Priority changed from (no value) to '3' | ||
| # | Thu Jan 24 12:46:14 2008 | roy - Fix scheduled NR added | ||
| # | Thu Jan 24 12:48:08 2008 | roy - Correspondence added | [Reply] | |
|
Hi Wayne-- Thanks for following up our discussion today at the VDT Office Hours with a ticket. I've assigned it to Scot Kronenfeld. He made the RSV probes work with Condor, so this is right up his alley. -alain ----------------------------------------------------------------- Alain Roy vdt-support@opensciencegrid.org VDT Support http://vdt.cs.wisc.edu/support.html |
||||
| # | Thu Jan 24 12:48:08 2008 | RT_System - Status changed from 'new' to 'open' | ||
| # | Thu Jan 24 13:12:51 2008 | kronenfe - Correspondence added | [Reply] | |||||||||||
Hi Wayne, The reason I mentioned making sure that you are running condor_q on the right condor is because it does not matter which condor_q binary you run ( i.e. the pre-installed one, or the one in $VDT_LOCATION/condor-devel/bin). The important thing is what the env variable CONDOR_CONFIG is set to. If you do the following: cd $VDT_LOCATION . setup.sh echo $CONDOR_CONFIG you should be pointing at your pre-installed Condor's config file. Condor-Devel does not put anything into the global setup.sh file in order to minimize conflicts with your main Condor install. At this point, if you run condor_q, you should not see any RSV jobs. If you do, we'll have to resolve that issue. In order to see your Condor-Devel queue, do the following . vdt/etc/condor-devel-env.sh echo $CONDOR_CONFIG (this should now be pointing at condor-devel's config file) condor_q Ideally, you would now see the RSV jobs (and only the RSV jobs). Let me know if the RSV jobs are running in the wrong Condor instance, and if so, we'll debug further. Thanks, Scot |
||||||||||||||
| # | Thu Jan 24 13:51:08 2008 | wbetts@bnl.gov - Correspondence added | [Reply] | |||||||||
Here is the state of things: [root@stargrid02 OSG]# echo $VDT_LOCATION /opt/OSG-0.8.0 [root@stargrid02 OSG]# echo $CONDOR_CONFIG /etc/condor/condor_config <--- This is the pre-existing condor config file, as expected [root@stargrid02 OSG]# condor_q -- Submitter: stargrid02.rcf.bnl.gov : <130.199.6.168:20971> : stargrid02.rcf.bnl.gov ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 142781.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142782.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142783.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142784.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142785.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142786.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142787.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142788.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142789.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142790.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142791.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142792.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142793.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142794.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142795.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 probe_wrapper.pl / 142796.0 wbetts 1/22 17:18 0+00:00:00 I 0 9.8 gratia-script-cons 142797.0 wbetts 1/22 17:18 0+00:00:00 I 0 9.8 html-consumer 142798.0 wbetts 1/22 17:18 0+00:00:00 I 0 9.8 rotate_html_files. 18 jobs; 3 idle, 0 running, 15 held Now I look at the condor-devel queue: [root@stargrid02 OSG]# . vdt/etc/condor-devel-env.sh [root@stargrid02 OSG]# echo $CONDOR_CONFIG /opt/OSG-0.8.0/condor-devel/etc/condor_config [root@stargrid02 OSG]# condor_q -- Submitter: stargrid02.rcf.bnl.gov : <130.199.6.168:36559> : stargrid02.rcf.bnl.gov ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held [root@stargrid02 OSG]# This is unchanged since Tuesday when I tried to get RSV working. My first guess is that condor-cron spawns a process to submit the jobs, but the environment gets corrupted because the OSG setup scripts have links to them in /etc/profile.d. Is that possible? I'll try to get a crash course in condor-cron... -Wayne Scot Kronenfeld via RT wrote: > Hi Wayne, > The reason I mentioned making sure that you are running condor_q on the > right condor is because it does not matter which condor_q binary you run ( > i.e. the pre-installed one, or the one in $VDT_LOCATION/condor-devel/bin). > The important thing is what the env variable CONDOR_CONFIG is set to. > > If you do the following: > > cd $VDT_LOCATION > . setup.sh > echo $CONDOR_CONFIG > > you should be pointing at your pre-installed Condor's config file. > Condor-Devel does not put anything into the global setup.sh file in order to > minimize conflicts with your main Condor install. At this point, if you run > condor_q, you should not see any RSV jobs. If you do, we'll have to resolve > that issue. > > In order to see your Condor-Devel queue, do the following > . vdt/etc/condor-devel-env.sh > echo $CONDOR_CONFIG (this should now be pointing at condor-devel's config > file) > condor_q > > Ideally, you would now see the RSV jobs (and only the RSV jobs). > > Let me know if the RSV jobs are running in the wrong Condor instance, and if > so, we'll debug further. > > Thanks, > Scot > > > > > ------------------------------------------------------------------------ > > Hi Wayne, > The reason I mentioned making sure that you are running condor_q on > the right condor is because it does not matter which condor_q binary > you run (i.e. the pre-installed one, or the one in > $VDT_LOCATION/condor-devel/bin). The important thing is what the env > variable CONDOR_CONFIG is set to. > > If you do the following: > > cd $VDT_LOCATION > . setup.sh > echo $CONDOR_CONFIG > > you should be pointing at your pre-installed Condor's config file. > Condor-Devel does not put anything into the global setup.sh file in > order to minimize conflicts with your main Condor install. At this > point, if you run condor_q, you should not see any RSV jobs. If you > do, we'll have to resolve that issue. > > In order to see your Condor-Devel queue, do the following > . vdt/etc/condor-devel-env.sh > echo $CONDOR_CONFIG (this should now be pointing at condor-devel's > config file) > condor_q > > Ideally, you would now see the RSV jobs (and only the RSV jobs). > > Let me know if the RSV jobs are running in the wrong Condor instance, > and if so, we'll debug further. > > Thanks, > Scot |
||||||||||||
| # | Thu Jan 24 14:27:45 2008 | kronenfe - Correspondence added | [Reply] | |||||||||
Wayne, The osg-rsv startup script (at $VDT_LOCATION/post-install/osg-rsv) sources $VDT_LOCATION/vdt/etc/condor-devel-env.sh. This sets CONDOR_CONFIG to point at the condor-devel instance. Then it submits the jobs via condor_submit as the user you run rsv as (which looks to be wbetts). Since I can see from your last email that source condor-devel-env.shsupplies the right value for CONDOR_CONFIG, one possibility is that there was an error when sourcing that file (permissions or something?). It does not seem likely, but I am confused about how else the jobs could end up in your main Condor's queue. Have you tried stopping and starting osg-rsv since you noticed this error? vdt-control --off osg-rsv vdt-control --on osg-rsv If the problem persists, you can try to add a debugging line in the osg-rsv startup script ($VDT_LOCATION/post-install/osg-rsv). Right underneath where condor-devel-env.sh is sourced, try printing $CONDOR_CONFIG: ## . $VDT_LOCATION/vdt/etc/condor-devel-env.sh## Source the Condor-Devel setup file ## echo $CONDOR_CONFIG Then try to stop and start osg-rsv again. You'll have to look at the bottom of $VDT_LOCATION/vdt-install.log for the output of that echo command. Let me know how that goes, and we'll go from there. Thanks, Scot On Jan 24, 2008 1:51 PM, wbetts@bnl.gov via RT < vdt-support@opensciencegrid.org> wrote: > Here is the state of things: > > [root@stargrid02 OSG]# echo $VDT_LOCATION > /opt/OSG-0.8.0 > [root@stargrid02 OSG]# echo $CONDOR_CONFIG > /etc/condor/condor_config <--- This is the pre-existing condor config > file, as expected > [root@stargrid02 OSG]# condor_q > > > -- Submitter: stargrid02.rcf.bnl.gov : <130.199.6.168:20971> : > stargrid02.rcf.bnl.gov > ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD > 142781.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142782.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142783.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142784.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142785.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142786.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142787.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142788.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142789.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142790.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142791.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142792.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142793.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142794.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142795.0 wbetts 1/22 17:18 0+00:00:00 H 0 9.8 > probe_wrapper.pl / > 142796.0 wbetts 1/22 17:18 0+00:00:00 I 0 9.8 > gratia-script-cons > 142797.0 wbetts 1/22 17:18 0+00:00:00 I 0 9.8 html-consumer > 142798.0 wbetts 1/22 17:18 0+00:00:00 I 0 9.8 > rotate_html_files. > > 18 jobs; 3 idle, 0 running, 15 held > > > Now I look at the condor-devel queue: > > [root@stargrid02 OSG]# . vdt/etc/condor-devel-env.sh > [root@stargrid02 OSG]# echo $CONDOR_CONFIG > /opt/OSG-0.8.0/condor-devel/etc/condor_config > [root@stargrid02 OSG]# condor_q > > > -- Submitter: stargrid02.rcf.bnl.gov : <130.199.6.168:36559> : > stargrid02.rcf.bnl.gov > ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD > > 0 jobs; 0 idle, 0 running, 0 held > [root@stargrid02 OSG]# > > > This is unchanged since Tuesday when I tried to get RSV working. My > first guess is that condor-cron spawns a process to submit the jobs, > but the environment gets corrupted because the OSG setup scripts have > links to them in /etc/profile.d. Is that possible? I'll try to get a > crash course in condor-cron... > > -Wayne > > > Scot Kronenfeld via RT wrote: > > Hi Wayne, > (> > The reason I mentioned making sure that you are running condor_q on the > > right condor is because it does not matter which condor_q binary you run > > i.e. the pre-installed one, or the one in > $VDT_LOCATION/condor-devel/bin).> > The important thing is what the env variable CONDOR_CONFIG is set to. > order to> > > > If you do the following: > > > > cd $VDT_LOCATION > > . setup.sh > > echo $CONDOR_CONFIG > > > > you should be pointing at your pre-installed Condor's config file. > > Condor-Devel does not put anything into the global setup.sh file in > > minimize conflicts with your main Condor install. At this point, if you > run> > condor_q, you should not see any RSV jobs. If you do, we'll have to > resolve> > that issue. > config> > > > In order to see your Condor-Devel queue, do the following > > . vdt/etc/condor-devel-env.sh > > echo $CONDOR_CONFIG (this should now be pointing at condor-devel's > > file) > and if> > condor_q > > > > Ideally, you would now see the RSV jobs (and only the RSV jobs). > > > > Let me know if the RSV jobs are running in the wrong Condor instance, > > so, we'll debug further. >> > > > Thanks, > > Scot > > > > > > > > > > ------------------------------------------------------------------------ > > > > Hi Wayne, > > The reason I mentioned making sure that you are running condor_q on > > the right condor is because it does not matter which condor_q binary > > you run (i.e. the pre-installed one, or the one in > > $VDT_LOCATION/condor-devel/bin). The important thing is what the env > > variable CONDOR_CONFIG is set to. > > > > If you do the following: > > > > cd $VDT_LOCATION > > . setup.sh > > echo $CONDOR_CONFIG > > > > you should be pointing at your pre-installed Condor's config file. > > Condor-Devel does not put anything into the global setup.sh file in > > order to minimize conflicts with your main Condor install. At this > > point, if you run condor_q, you should not see any RSV jobs. If you > > do, we'll have to resolve that issue. > > > > In order to see your Condor-Devel queue, do the following > > . vdt/etc/condor-devel-env.sh > > echo $CONDOR_CONFIG (this should now be pointing at condor-devel's > > config file) > > condor_q > > > > Ideally, you would now see the RSV jobs (and only the RSV jobs). > > > > Let me know if the RSV jobs are running in the wrong Condor instance, > > and if so, we'll debug further. > > > > Thanks, > > Scot > > -- > View ticket at < > http://vdt.cs.wisc.edu/rt/Ticket/Display.html?user=guest&pass=guest&id=3258 > > > VDT Support, vdt-support@ivdgl.org> |
||||||||||||
| # | Fri Jan 25 15:32:29 2008 | wbetts@bnl.gov - Correspondence added | [Reply] | |||||||||
Message body not shown because it is too large or is not plain text. |
||||||||||||
| # | Fri Jan 25 17:11:26 2008 | kronenfe - Correspondence added | [Reply] | |||||||||
> > > On a vaguely related note: the "Condor guy" here at BNL (Alex Withers) > has suggested installing the latest stable Condor 7.x, replacing our > current condor 6.8.4. Might that handle the RSV jobs and eliminate the > need for condor-devel? > > -Wayne > Yes, this should work. We will be working on the RSV infrastructure for the 1.8.2 release to work with pre-installed Condor 7.0 instances, and mostly eliminate the need for Condor-Devel. For right now, with VDT 1.8.1, you should be able to set VDTSETUP_CONDOR_DEVEL_LOCATION to the location of your Condor 7.0 install (in this case, also being your main Condor install), and RSV will use this location. I'm going to run a test for this type of install locally to make sure there are not additional steps, and I'll get back to you about the verdict. In the meantime, if you decide not to take this approach, let me know and I will help you debug this issue regarding the 6.8.x install further. Thanks, Scot |
||||||||||||
| # | Mon Jan 28 09:45:28 2008 | kronenfe - Correspondence added | [Reply] | |||||||||
On Jan 25, 2008 5:10 PM, Scot Kronenfeld <kronenfe@cs.wisc.edu> wrote: > > > On a vaguely related note: the "Condor guy" here at BNL (Alex Withers) >> > has suggested installing the latest stable Condor 7.x, replacing our > > current condor 6.8.4. Might that handle the RSV jobs and eliminate the > > need for condor-devel? > > > > -Wayne > > > Yes, this should work. We will be working on the RSV infrastructure for > the 1.8.2 release to work with pre-installed Condor 7.0 instances, and > mostly eliminate the need for Condor-Devel. > > For right now, with VDT 1.8.1, you should be able to set > VDTSETUP_CONDOR_DEVEL_LOCATION to the location of your Condor 7.0 install > (in this case, also being your main Condor install), and RSV will use this > location. I'm going to run a test for this type of install locally to make > sure there are not additional steps, and I'll get back to you about the > verdict. In the meantime, if you decide not to take this approach, let me > know and I will help you debug this issue regarding the 6.8.x install > further. > Hi Wayne I ran a VDT 1.8.1 test locally using a pre-existing Condor 7 installation for both Condor and Condor-Devel. The RSV jobs and some manual jobs I submitted all ran fine. During the install, you usually set VDTSETUP_CONDOR_LOCATION to point at your pre-existing install. When installing OSG-RSV, you will also need to set VDTSETUP_CONDOR_DEVEL_LOCATION to point at this same install. Let me know if you have any questions. Thanks, Scot |
||||||||||||
| # | Wed Jan 30 15:12:07 2008 | wbetts@bnl.gov - Correspondence added | [Reply] | |||||||||
Hi Scot, After upgrading condor to 7.0 and configuring it to allow 'local' universe jobs, the RSV jobs appear to be running successfully. Can you check to see if the Gratia probe data is making it into the database at rsv.grid.iu.edu? One small hitch though -- the init script for osg-rsv includes ## . $VDT_LOCATION/vdt/etc/condor-devel-env.sh## Source the Condor-Devel setup file ## I ran the configure_osg_rsv script after setting VDTSETUP_CONDOR_DEVEL_LOCATION = /home/condor (base of the 7.0 installation), but this line still refers to the condor_devel in $VDT_LOCATION. I had to comment that out of course since we are not using the VDT condor-devel. The hitch is that vdt-control regenerates the init scripts each time it enables a service, and thus this change is lost at each "vdt --off osg-rsv; vdt --on osg-rsv" cycle. Do you know how I can permanently get rid of this line? Thank you for the assistance, -Wayne Scot Kronenfeld via RT wrote: > I ran a VDT 1.8.1 test locally using a pre-existing Condor 7 installation > for both Condor and Condor-Devel. The RSV jobs and some manual jobs I > submitted all ran fine. > > During the install, you usually set VDTSETUP_CONDOR_LOCATION to point at > your pre-existing install. When installing OSG-RSV, you will also need to > set VDTSETUP_CONDOR_DEVEL_LOCATION to point at this same install. > > Let me know if you have any questions. > > > |
||||||||||||
| # | Wed Jan 30 16:17:46 2008 | kronenfe - Correspondence added | [Reply] | |||||||||||
Hi Wayne, > After upgrading condor to 7.0 and configuring it to allow 'local' > universe jobs, the RSV jobs appear to be running successfully. Can you > check to see if the Gratia probe data is making it into the database at > rsv.grid.iu.edu? Arvind will have to check this. I've CCed him on the email. > One small hitch though -- the init script for osg-rsv includes > > ## > . $VDT_LOCATION/vdt/etc/condor-devel-env.sh> ## Source the Condor-Devel setup file > ## The condor-devel-env.sh file should exist, and be pointing at /home/condor. It should set CONDOR_CONFIG and CONDOR_DEVEL_LOCATION to their correct values. It may not be necessary, but it shouldn't hurt either :) Let me know if the values in that file look correct. Thanks, Scot |
||||||||||||||
| # | Wed Jan 30 17:02:51 2008 | agopu@indiana.edu - Correspondence added | [Reply] | |||||||||||
Hi Wayne, Yes, your STAR-BNL resource appears to be uploading RSV records consistently for the past several hours. As a sidenote, if you've not done so, feel free to configure your VDT apache to serve RSV pages as explained here. http://rsv.grid.iu.edu/documentation/vdt-package.html#configure_osg_rsv_script_localweb If you did that, you'll also be able to view your RSV results locally at https://stargrid02.rcf.bnl.gov:8443/rsv And btw, ignore the directory-CE-permissions probe warnings - we're working on a better version of that probe. I do notice expired CRLs though that you might want to fix :-) Cheers, Arvind On 2008-01-30 16:12 (-0600), Scot Kronenfeld had pondered: > Hi Wayne, > > > After upgrading condor to 7.0 and configuring it to allow 'local' >> > universe jobs, the RSV jobs appear to be running successfully. Can you > > check to see if the Gratia probe data is making it into the database at > > rsv.grid.iu.edu? > Arvind will have to check this. I've CCed him on the email. > > > One small hitch though -- the init script for osg-rsv includes >> > > > ## > > . $VDT_LOCATION/vdt/etc/condor-devel-env.sh> > ## Source the Condor-Devel setup file > > ## > The condor-devel-env.sh file should exist, and be pointing at > /home/condor. It should set CONDOR_CONFIG and CONDOR_DEVEL_LOCATION > to their correct values. It may not be necessary, but it shouldn't > hurt either :) > > Let me know if the values in that file look correct. > > Thanks, > Scot > |
||||||||||||||
| # | Fri Feb 01 11:35:03 2008 | kronenfe - Comments added | [Reply] | |
|
From John: In response to my question to see if the su -c command is setting the environment incorrectly. [root@mstr1 ~]# . $VDT_LOCATION/setup.sh [root@mstr1 ~]# . /home/osg/vdt/etc/condor-devel-env.sh [root@mstr1 ~]# su -c "env" osg-rsv | grep CONDOR_CONFIG VDTSETUP_CONDOR_CONFIG=/home/condor/condor_config CONDOR_CONFIG=/home/condor/condor_config which is my production CONDOR i do set CONDOR_CONFIG in /etc/csh.cshrc which is osg-rsv's default shell...could that have anything to do with it? |
||||
| # | Fri Feb 01 11:37:49 2008 | kronenfe - Comments added | [Reply] | |
|
> [root@mstr1 ~]# . $VDT_LOCATION/setup.sh > [root@mstr1 ~]# . /home/osg/vdt/etc/condor-devel-env.sh > [root@mstr1 ~]# su -c "env" osg-rsv | grep CONDOR_CONFIG > VDTSETUP_CONDOR_CONFIG=/home/condor/condor_config > CONDOR_CONFIG=/home/condor/condor_config > > which is my production CONDOR > > i do set CONDOR_CONFIG in /etc/csh.cshrc which is osg-rsv's default > shell...could that have anything to do with it? This explains the problems that Wayne was having. We need to edit the RSV documentation to mention this issue. |
||||
| # | Fri Feb 01 11:37:50 2008 | kronenfe - Status changed from 'open' to 'resolved' | ||
Time to display: 2.552931
»|« RT 3.8.2 Copyright 1996-2008 Best Practical Solutions, LLC.