Comments about this user:
No comment entered about this user This user's 10 highest priority tickets:
|
|
| # | Tue Dec 02 10:45:06 2008 | weigand@fnal.gov - Ticket created | [Reply] | |||||||||
VDT 1.10.1n VOMS 1.8.8-2 MySQL 5.0.67 If there has been no activity against the VOMS proxy server (/usr/local/osg-voms/glite/sbin/edg-voms) for 8+ hours, the server stops processing proxy requests. The server has to be re-cycled to become functional again. For a large VO, this is not likely a problem. However for a small VO, 8 hours seems rather short. From what I have read, the cause appears to be a timeout on the MySql connection (8 hours is the default). I believe the VOMS server uses hibernate and c3p0 to manage connection sessions. For what it is worth, these are some links I have found on this connection timeout problem. http://www.codefin.net/2007/05/hibernate-and-mysql-connection-timeouts.html http://onzoob.com/archive/read/troubleshooting-c3p0-connection-pool At the back end of this is a voms-proxy-init that fails due to the connection problem. I have also listed the log entries for a good proxy and failed proxy since there is no real indication in the log that there is a problem. One confusing aspect of logs is that there are 2 lines showing "... LOG_ERROR:STARTUP ..." which appear on both a good and failed proxy request. John Weigand ############################################################ voms-proxy-init --voms oiv_test1 Enter GRID pass phrase: Your identity: /DC=org/DC=doegrids/OU=People/CN=John Weigand 458491 Creating temporary proxy ............................................. Done Contacting cms-xen3.fnal.gov:15000 [/DC=org/DC=doegrids/OU=Services/CN=http/cms-xen3.fnal.gov] "oiv_test1"gss_assist_get_unwrap failure: globus_gss_assist token :3: failure: Connection closed Failed Error: GSS authentication failure globus_gss_assist token :3: failure: Connection closed None of the contacted servers for oiv_test1 were capable of returning a valid AC for the user. ########################################################### /usr/local/osg-voms/glite/log/oiv_test1 log file show no error on failure.==== GOOD ( after 8 hours of inactivity ) =============================== Mon Dec 1 19:46:42 2008:cms-xen3.fnal.gov:vomsd[26644]: msg="LOG_INFO:REQUEST:Listen (Server.cpp:361):Received connection from: 131.225.204.129:30143." Mon Dec 1 19:46:42 2008:cms-xen3.fnal.gov:vomsd[26644]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:600):Starting Executor with pid = 1418" Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:612):Self : /DC=org/DC=doegrids/OU=Services/CN=http/cms-xen3.fnal.gov" Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:613):Self CA : /DC=org/DC=DOEGrids/OU=Certificate Authorities/CN=DOEGrids CA 1" Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:620):At: Mon Dec 1 19:46:43 2008. Received Contact :" Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:621): user: /DC=org/DC=doegrids/OU=Services/CN=cms-xen3.fnal.gov" Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:622): ca : /DC=org/DC=DOEGrids/OU=Certificate Authorities/CN=DOEGrids CA 1" Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:623): serial: 5F11" Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_ERROR:STARTUP:my_recv (globuswrap.c:112):trueres = 1." Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_ERROR:STARTUP:my_recv (globuswrap.c:112):trueres = 1." Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_INFO:REQUEST:Execute (vomsd.cc:737):Userid = "10"" Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_INFO:REQUEST:Execute (vomsd.cc:753):Next command : G/oiv_test1" Mon Dec 1 19:46:43 2008:cms-xen3.fnal.gov:vomsd[1418]: msg="LOG_INFO:RESULT:Execute (vomsd.cc:1022):Request Result: /oiv_test1/Role=NULL/Capability=NULL" ==== FAILED ( after 9 hours of inactivity) ============================== Tue Dec 2 04:46:40 2008:cms-xen3.fnal.gov:vomsd[26644]: msg="LOG_INFO:REQUEST:Listen (Server.cpp:361):Received connection from: 131.225.204.129:1743." Tue Dec 2 04:46:40 2008:cms-xen3.fnal.gov:vomsd[26644]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:600):Starting Executor with pid = 7067" Tue Dec 2 04:46:41 2008:cms-xen3.fnal.gov:vomsd[7067]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:612):Self : /DC=org/DC=doegrids/OU=Services/CN=http/cms-xen3.fnal.gov" Tue Dec 2 04:46:41 2008:cms-xen3.fnal.gov:vomsd[7067]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:613):Self CA : /DC=org/DC=DOEGrids/OU=Certificate Authorities/CN=DOEGrids CA 1" Tue Dec 2 04:46:41 2008:cms-xen3.fnal.gov:vomsd[7067]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:620):At: Tue Dec 2 04:46:41 2008. Received Contact :" Tue Dec 2 04:46:41 2008:cms-xen3.fnal.gov:vomsd[7067]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:621): user: /DC=org/DC=doegrids/OU=Services/CN=cms-xen3.fnal.gov" Tue Dec 2 04:46:41 2008:cms-xen3.fnal.gov:vomsd[7067]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:622): ca : /DC=org/DC=DOEGrids/OU=Certificate Authorities/CN=DOEGrids CA 1" Tue Dec 2 04:46:41 2008:cms-xen3.fnal.gov:vomsd[7067]: msg="LOG_INFO:REQUEST:Run (vomsd.cc:623): serial: 5F11" Tue Dec 2 04:46:41 2008:cms-xen3.fnal.gov:vomsd[7067]: msg="LOG_ERROR:STARTUP:my_recv (globuswrap.c:112):trueres = 1." Tue Dec 2 04:46:41 2008:cms-xen3.fnal.gov:vomsd[7067]: msg="LOG_ERROR:STARTUP:my_recv (globuswrap.c:112):trueres = 1." |
||||||||||||
| # | Tue Dec 02 12:05:45 2008 | roy - Given to cat | ||
| # | Tue Dec 02 15:30:53 2008 | cat - Priority changed from (no value) to '9' | ||
| # | Tue Dec 02 15:30:54 2008 | cat - Status changed from 'new' to 'open' | ||
| # | Wed Dec 03 11:59:42 2008 | cat - Correspondence added | [Reply] | |
|
John: > If there has been no activity against the VOMS proxy server for 8+ hours, the > server stops processing proxy requests. The server has to be re-cycled to > become functional again. Wow, that's pretty nasty! I will report this one to Vincenzo and ask him to debug it. One comment though: > [...] the cause appears to be a timeout on the MySQL connection (8 hours is > the default). I believe the VOMS server uses hibernate and c3p0 to manage > connection sessions. For what it is worth, these are some links I have found > on this connection timeout problem. This is not quite accurate. Hibernate and c3p0 are Java database layers that provide developers with higher-level data abstractions. The VOMS server is a C/C++ program, not Java. I believe you are thinking of VOMS Admin, which is a Java web application, but VOMS Admin is not involved in handling VOMS client (voms-proxy-init) requests. Now, perhaps there is a similar timeout issue happening on the MySQL client API level, it would just be happening in C, not Java. Anyway, I will pass this along to Vincenzo and keep you posted. -- Tim |
||||
| # | Wed Dec 03 12:14:23 2008 | cat - Correspondence added | [Reply] | |
|
John: I have filed the VOMS bug in the LCG Savannah. You can track that ticket at: https://savannah.cern.ch/bugs/index.php?44936 -- Tim |
||||
| # | Thu Dec 04 08:56:29 2008 | cat - Correspondence added | [Reply] | |
|
John: This is Vincenzo's response: > The interesting part here is this: >> Error: GSS authentication failure >> globus_gss_assist token :3: failure: Connection closed > What is the status of certificates/CRL on both machines? > Also, note that if there is a problem with GSS, nothing should appear on the > logs, since voms-proxy-init fails before getting authenticated. > Is this reproducible on the usual machine? > I tried on mine, and this problem does not seem to be there. Can you please comment on the state of certificates and CRLs on your test machine? Also, have you reproduced this behavior reliably? -- Tim |
||||
| # | Thu Dec 04 11:35:30 2008 | weigand@fnal.gov - Correspondence added | [Reply] | |||||||||
embedded.... John Weigand Tim Cartwright via RT wrote: > John: Nothing is wrong with either the CRLs or certificates on the machines.> > This is Vincenzo's response: > > >> The interesting part here is this: >>> > >>> Error: GSS authentication failure >>> globus_gss_assist token :3: failure: Connection closed >>> > >> What is the status of certificates/CRL on both machines? >> If that were the case, I would not think that re-cycling the voms proxy server would have any effect. The test was run with the client (voms-proxy-init) and proxy server (edg-voms) on the same machine. >> Also, note that if there is a problem with GSS, nothing should appear on the >> logs, since voms-proxy-init fails before getting authenticated. >> any logs file. I am not saying that there is a problem with GSS. That is what the voms-proxy-init is saying. The purpose of showing the server logs is to indicate that the proxy request is being sent to the proxy server. It is doing some processing, however it just stops at that point. The first log shows a good one. The 2nd set shows how it just stops. What or if it actually sends anything back is a personal thing between the server and client that I have no clue about. The output of the voms-proxy-init says it is contacting the server. And the server log I show indicates it has sent something. One guess is that this may be a timeout (very short at that) and the actual error message is misleading. > !00% reproducible. I did it over several days. However, you do have to > >> Is this reproducible on the usual machine? >> wait 8 hours for the connections to drop. > That is totally baffling. I have to 2 test nodes with VOMS on them and > >> I tried on mine, and this problem does not seem to be there. >> both have the same problem. I just ran against the 2nd one and failed. Once I recycled the voms server, I was successful. And, just in case the issue of being on the same node for the client and server comes up next. This test was done from a different node as the server. Are you sure you are waiting 8 hours (or does your my.cnf set the default longer? Are you sure there is no activity for that period? > I also ran a script against the mysql database doing a 'show proceslist' > Can you please comment on the state of certificates and CRLs on your test > machine? Also, have you reproduced this behavior reliably? > > -- Tim > > to check on connections. It ran every hour and you can see the connections drop between 22:13 and 23:13 for the 3 test VOs my VOMS has. ################################################### # Tue Dec 2 22:13:01 CST 2008#-------------------------------------------------- Id User Host db Command Time State Info 1 oiv_test1_adm localhost voms_oiv_test1 Sleep 27582 NULL 2 oiv_test2_adm localhost voms_oiv_test2 Sleep 27688 NULL 3 oiv_test3_adm localhost voms_oiv_test3 Sleep 27687 NULL 3355 oiv_test1_adm localhost:33943 voms_oiv_test1 Sleep 83 NULL 3358 oiv_test2_adm localhost:33946 voms_oiv_test2 Sleep 70 NULL 3363 oiv_test3_adm localhost:33951 voms_oiv_test3 Sleep 64 NULL 3400 oiv_test3_adm localhost:33988 voms_oiv_test3 Sleep 93 NULL 3401 oiv_test1_adm localhost:33989 voms_oiv_test1 Sleep 14 NULL 3402 oiv_test1_adm localhost:33990 voms_oiv_test1 Sleep 14 NULL 3403 oiv_test3_adm localhost:33991 voms_oiv_test3 Sleep 85 NULL 3404 oiv_test2_adm localhost:33992 voms_oiv_test2 Sleep 2 NULL 3405 oiv_test2_adm localhost:33993 voms_oiv_test2 Sleep 2 NULL 3406 oiv_test3_adm localhost:33994 voms_oiv_test3 Sleep 72 NULL 3407 oiv_test3_adm localhost:33995 voms_oiv_test3 Sleep 72 NULL 3408 oiv_test2_adm localhost:33996 voms_oiv_test2 Sleep 2 NULL 3409 oiv_test1_adm localhost:33997 voms_oiv_test1 Sleep 14 NULL 3410 oiv_test1_adm localhost:33998 voms_oiv_test1 Sleep 14 NULL 3411 oiv_test2_adm localhost:33999 voms_oiv_test2 Sleep 2 NULL 3412 root localhost NULL Query 0 NULL show processlist ################################################### # Tue Dec 2 23:13:01 CST 2008#-------------------------------------------------- Id User Host db Command Time State Info 3801 oiv_test1_adm localhost:34388 voms_oiv_test1 Sleep 83 NULL 3805 oiv_test2_adm localhost:34392 voms_oiv_test2 Sleep 70 NULL 3807 oiv_test3_adm localhost:34394 voms_oiv_test3 Sleep 64 NULL 3844 oiv_test3_adm localhost:34431 voms_oiv_test3 Sleep 92 NULL 3845 oiv_test1_adm localhost:34432 voms_oiv_test1 Sleep 14 NULL 3846 oiv_test3_adm localhost:34433 voms_oiv_test3 Sleep 82 NULL 3847 oiv_test2_adm localhost:34434 voms_oiv_test2 Sleep 2 NULL 3848 oiv_test3_adm localhost:34435 voms_oiv_test3 Sleep 69 NULL 3849 oiv_test3_adm localhost:34436 voms_oiv_test3 Sleep 69 NULL 3850 oiv_test2_adm localhost:34437 voms_oiv_test2 Sleep 2 NULL 3851 oiv_test2_adm localhost:34438 voms_oiv_test2 Sleep 2 NULL 3852 oiv_test1_adm localhost:34439 voms_oiv_test1 Sleep 14 NULL 3853 oiv_test1_adm localhost:34440 voms_oiv_test1 Sleep 14 NULL 3854 oiv_test1_adm localhost:34441 voms_oiv_test1 Sleep 14 NULL 3855 oiv_test2_adm localhost:34442 voms_oiv_test2 Sleep 2 NULL 3856 root localhost NULL Query 0 NULL show processlist |
||||||||||||
| # | Thu Dec 04 11:48:32 2008 | roy - Correspondence added | [Reply] | |||||||||
Tim-- How hard would it be for you to replicate this problem here on gk1? Install VOMS today, and see if it's failing tomorrow? We could then give Vincenzo access to the computer and logs. -alain |
||||||||||||
| # | Mon Dec 08 09:39:09 2008 | cat - Priority changed from '9' to '-3' | ||
| # | Mon Dec 08 09:39:28 2008 | cat - Priority changed from '-3' to '9' | ||
| # | Tue Dec 09 09:40:11 2008 | cat - Membership in ticket #4532 added | ||
| # | Wed Dec 10 09:02:48 2008 | cat - Comments added | [Reply] | |||||||
Commit comment: Creating a new branch to test Vincenzo's fix to the VOMS-MySQL timeout problem. Changed files: A vdt/branches/vdt-1.10.1-voms-timeout/ To generate a diff: svn diff -c 8437 file:///p/condor/workspaces/vdt/svn |
||||||||||
| # | Wed Dec 10 09:31:32 2008 | cat - Comments added | [Reply] | |||||||
Commit comment: Force test cache to cat-voms-timeout for testing. Changed files: U vdt/branches/vdt-1.10.1-voms-timeout/defs To generate a diff: svn diff -c 8438 file:///p/condor/workspaces/vdt/svn |
||||||||||
| # | Thu Dec 11 13:24:46 2008 | cat - Comments added | [Reply] | |||||||
Commit comment: Two small changes to make a new build that includes Vincenzo's working CVS tag with the voms-mysql fix. Changed files: U vdt/branches/vdt-1.10.1-voms-timeout/VOMS/nmi/voms-mysql.cvs U vdt/branches/vdt-1.10.1-voms-timeout/defs To generate a diff: svn diff -c 8468 file:///p/condor/workspaces/vdt/svn |
||||||||||
| # | Fri Dec 12 10:24:42 2008 | weigand@fnal.gov - Correspondence added | [Reply] | |||||||||||
This issued appears resolved. I installed yesterday from the http://vdt.cs.wisc.edu/test-cache/cat-voms-timeout cache at 14:27 and tested this morning at 09:45. Before and after running the tests, I viewed the connections in the MySql database. It appears the solution was to remove the persistent connection and open a database connection with each request. This would definitely eliminate any connection timeout issues. Performance degradation may or may not be a problem with this solution as it would depend on the proxy request rate and the number of VOMS servers using the same database as they do at FNAL. This I can make no judgment on, but do not feel it will be an issue (hmmmm.... I guess I just made a judgment). John Weigand |
||||||||||||||
| # | Fri Dec 12 10:33:40 2008 | cat - Correspondence added | [Reply] | |||||||||||
John, thanks for the update, just about to do my own test. But please file this update in the Savannah ticket! That's where it needs to be seen. -- Tim On 12 Dec 2008, at 10:13, John Weigand wrote: > This issued appears resolved. > > I installed yesterday from the http://vdt.cs.wisc.edu/test-cache/cat-voms-timeout > cache at 14:27 and tested this morning at 09:45. > > Before and after running the tests, I viewed the connections in the > MySql database. It appears the solution was to remove the persistent > connection and open a database connection with each request. This > would definitely eliminate any connection timeout issues. > Performance degradation may or may not be a problem with this > solution as it would depend on the proxy request rate and the number > of VOMS servers using the same database as they do at FNAL. This I > can make no judgment on, but do not feel it will be an issue > (hmmmm.... I guess I just made a judgment). > > > John Weigand |
||||||||||||||
| # | Mon Jan 12 16:35:24 2009 | cat - Comments added | [Reply] | |||||||
Commit comment: Switched to final build tag for voms-mysql. Changed files: U vdt/branches/vdt-1.10.1-voms-timeout/VOMS/nmi/voms-mysql.cvs To generate a diff: svn diff -c 8595 file:///p/condor/workspaces/vdt/svn |
||||||||||
| # | Tue Jan 13 14:38:38 2009 | cat - Comments added | [Reply] | |||||||
Commit comment: Tag for first merge from this branch to vdt-1.10.1 branch. Changed files: A vdt/tags/vdt-1.10.1-voms-timeout-merge1/ To generate a diff: svn diff -c 8596 file:///p/condor/workspaces/vdt/svn |
||||||||||
| # | Tue Jan 13 14:54:03 2009 | cat - Comments added | [Reply] | |||||||
Commit comment: Merge from vdt-1.10.1-voms-timeout branch: svn merge -r 8437:8596 $SVN/vdt/branches/vdt-1.10.1-voms-timeout . Revision 8596 = tags/vdt-1.10.1-voms-timeout-merge1. Updated to VOMS-MySQL 3.1.0, which fixes the timeout problem; see RT #4472. Changed files: U vdt/branches/vdt-1.10.1/VOMS/nmi/voms-mysql.cvs U vdt/branches/vdt-1.10.1/defs To generate a diff: svn diff -c 8597 file:///p/condor/workspaces/vdt/svn |
||||||||||
| # | Tue Jan 13 15:07:41 2009 | cat - Correspondence added | [Reply] | |||||||||
John: You submitted a bug report starting with the following: > If there has been no activity against the VOMS proxy server (/usr/ > local/osg-voms/glite/sbin/edg-voms) for 8+ hours, the server stops > processing proxy requests. We recently got a final CVS tag from Vincenzo for this fix, and I now have a build in place, ready to release. If you would like to test it, please use the 1.10.99 cache: pacman -get http://vdt.cs.wisc.edu/vdt_11099_cache:VOMS We'll be running our automated tests on this new VOMS build, but we don't have the facilities to test the timeout issue. Therefore, I have done a manual test to confirm the fix. If all goes well, this change will be released later this week or early next week. -- Tim |
||||||||||||
| # | Tue Jan 13 15:16:41 2009 | weigand@fnal.gov - Correspondence added | [Reply] | |||||||||
Will do john Tim Cartwright via RT wrote: > John: > > You submitted a bug report starting with the following: > > >> If there has been no activity against the VOMS proxy server (/usr/ >>> local/osg-voms/glite/sbin/edg-voms) for 8+ hours, the server stops >> processing proxy requests. >> > We recently got a final CVS tag from Vincenzo for this fix, and I now > have a build in place, ready to release. If you would like to test > it, please use the 1.10.99 cache: > > pacman -get http://vdt.cs.wisc.edu/vdt_11099_cache:VOMS > > We'll be running our automated tests on this new VOMS build, but we > don't have the facilities to test the timeout issue. Therefore, I > have done a manual test to confirm the fix. > > If all goes well, this change will be released later this week or > early next week. > > -- Tim > > > |
||||||||||||
| # | Wed Jan 14 11:31:34 2009 | cat - Comments added | [Reply] | |||||||
Commit comment: Turned on 1.10.99 testing again, because the latest VOMS build needs testing before release. Changed files: U tests/trunk/test-scripts/tests-to-run To generate a diff: svn diff -c 8598 file:///p/condor/workspaces/vdt/svn |
||||||||||
| # | Tue Feb 03 09:31:00 2009 | cat - Status changed from 'open' to 'resolved' | ||
| # | Tue Feb 03 09:31:01 2009 | cat - Fix scheduled CUR added | ||
| # | Tue Feb 03 09:31:02 2009 | cat - Fixed in 1.10.1s added | ||
Time to display: 3.207669
»|« RT 3.8.2 Copyright 1996-2008 Best Practical Solutions, LLC.