Skip Menu | Logged in as guest | Logout
 
Ticket metadata
Id: 2989
Status: resolved
Priority: 3/0
Queue: vdt-support

Fixed in: (no value)
Fix scheduled: OLD

Owner: Alain Roy
Requestors: litmaath@cern.ch
Cc: David.Smith@cern.ch
john.white@cern.ch
s.burke@rl.ac.uk
AdminCc:

More about litmaath@cern.ch
Comments about this user:
No comment entered about this user
This user's 10 highest priority tickets:
Groups this user belongs to:
  • Unprivileged
  • Everyone

New reminder:

Created: Mon Oct 01 04:07:35 2007
Starts: Not set
Started: Not set
Last Contact: Fri Feb 22 17:09:46 2008
Due: Not set
Closed: Wed Mar 05 21:10:55 2008
Updated: Wed Mar 05 21:10:55 2008 by roy



History Brief headersFull headers
Subject: [bug #29930] gridftp2 server can send truncated control channel messages
Date: Mon, 01 Oct 2007 11:02:53 +0200
To: Maarten Litmaath <litmaath@cern.ch>, David Smith <David.Smith@cern.ch>, John White <john.white@cern.ch>, Stephen Burke <s.burke@rl.ac.uk>, vdt-support@OPENSCIENCEGRID.ORG
From: "noreply [Maarten Litmaath]" <noreply-savannah@cern.ch>
Download (untitled) / with headers
text/plain 4.3k
This is an automated notification sent by LCG Savannah.
It relates to:
bugs #29930, project gLite Middleware

==============================================================================
LATEST MODIFICATIONS of bugs #29930:
==============================================================================

Follow-up Comment #2, bug #29930 (project jra1mdw):


Dear VDT Support,
please look into this bug considered critical for WLCG/EGEE.


==============================================================================
OVERVIEW of bugs #29930:
==============================================================================

URL:
<http://savannah.cern.ch/bugs/?29930>

Summary: gridftp2 server can send truncated control channel
messages
Project: gLite Middleware
Submitted by: dhsmith
Submitted on: 2007-09-27 15:18
Status: None
Open/Closed: Open
Category: Globus
Severity: 6 - Critical
Baseline Release: Unknown
OS: None
Architecture: None
Bug detection area: Production
Assigned to: maart
Priority: 5 - Enhancement
GGUS reference URL: https://gus.fzk.de/ws/ticket_info.php?ticket=26868
Component tag(s):
Subsystem tag(s):
Discussion Lock: Any
Build environment: None
Release:

_______________________________________________________


Hello,

We recently had a problem reported by a site using a 64bit, SL4 release of
DPM (and hence the gridftp2 server) that was traced to an internal
globus/gridftp2 lib issue:

from VDT RPM vdt_globus_data_server-VDT1.6.0x86_64_rhas_4-3
in library(s) libglobus_gridftp_server_control_*

function globus_l_xio_gssapi_ftp_write(), from source file:
globus_xio_gssapi_ftp.c at line aprox 2590:

res = globus_xio_driver_pass_write(
op,
&handle->auth_write_iov,
1,
length,
cb,
handle);

here the argument 'length' refers to the length of the unwrapped
(unencrpyted) and binary (not base64) version of the message (whereas
handle->auth_write_iov.iov_len has the actual length); and thus 'length' is
about 75% of the actual length of the message that the function wants to
send. The forth argument to globus_xio_driver_pass_write() is the 'wait_for'
parameter and sets the minimum amount of data to have sent to the kernel
before the callback is called. Thus if the kernel accepts more than about 75%
of the length of the message, but less than all of it, the message is
trunctaed.

Browsing the globus CVS I noticed that recent versions/branches of
globus_xio_gssapi_ftp.c have been reworked and probably do not suffer from
this problem - although I don't know if exactly this problem was ever
recognised explictly. I don't know if there are any globus/gridftp2 releases
based on the new code, but it is likely we will find problems again from this
bug so it is very desirable to have this fixed - either as a fix to the older
version or by moving to the newer code.

Thank you,
David

_______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2007-10-01 09:02 By: Maarten Litmaath <maart>

Dear VDT Support,
please look into this bug considered critical for WLCG/EGEE.


-------------------------------------------------------
Date: 2007-09-28 08:53 By: David Smith <dhsmith>
Hello,

Moved to critical - as I expect this problem will lead to service instability
anywhere where we deploy gridftp2, which is an increasing number of the DPM
sites.

David





_______________________________________________________

Carbon-Copy List:

CC Address | Comment
------------------------------------+-----------------------------
488 | -COM-
vdt-support@opensciencegrid.org |
868 | -UPD-
sburke |
645 | -SUB-




==============================================================================

This item URL is:
<http://savannah.cern.ch/bugs/?29930>

_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/
Download (untitled) / with headers
text/plain 1.3k
I'll work with Globus immediately to resolve this bug. Two questions for
you:

> function globus_l_xio_gssapi_ftp_write(), from source file:
> globus_xio_gssapi_ftp.c at line aprox 2590:
>
> res = globus_xio_driver_pass_write(
> op,
> &handle->auth_write_iov,
> 1,
> length,
> cb,
> handle);
>
> here the argument 'length' refers to the length of the unwrapped
> (unencrpyted) and binary (not base64) version of the message (whereas
> handle->auth_write_iov.iov_len has the actual length); and thus
> 'length' is
> about 75% of the actual length of the message that the function wants
> to
> send. The forth argument to globus_xio_driver_pass_write() is the
> 'wait_for'
> parameter and sets the minimum amount of data to have sent to the
> kernel
> before the callback is called. Thus if the kernel accepts more than
> about 75%
> of the length of the message, but less than all of it, the message is
> trunctaed.

Are you suggesting that we could simply pass
handle->auth_write_iov.iov_len instead of length? Clearly it would
require testing, but is it possible that the fix is as simple as that?

I understand that this is a critical bug, but I don't have a good
feeling for how it is showing up in productions. Does it result in some
file transfers failing? What is the symptom of the failure? How often
does it occur?

Thanks!
-alain
Download (untitled) / with headers
text/plain 121b
This is now Globus bug #5590:

http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=5590


I'll keep a close eye on it.
CC: litmaath@cern.ch, john.white@cern.ch, s.burke@rl.ac.uk
Subject: Re: [vdt-support #2989] [bug #29930] gridftp2 server can send truncated control channel messages
Date: Tue, 02 Oct 2007 08:42:16 +0200
To: vdt-support@OPENSCIENCEGRID.ORG
From: David Smith <David.Smith@cern.ch>
Download (untitled) / with headers
text/plain 3.2k

On Oct 1, 2007, at 8:30 PM, Alain Roy via RT wrote:

> I'll work with Globus immediately to resolve this bug. Two
> questions for
> you:
[...]
>
> Are you suggesting that we could simply pass
> handle->auth_write_iov.iov_len instead of length? Clearly it would
> require testing, but is it possible that the fix is as simple as that?

Hello Alain,

Thanks for taking this up. I believe passing handle-
>auth_write_iov.iov_len instead of length would be fine; in the case
we had in production we made a test library which effectively did
that, which resolved the problem. However it was a small binary patch
(as I wasn't confident of rebuilding the library from source with
absolutely no other change just for the test) so strictly speaking we
did not try out exactly that source change.

>
> I understand that this is a critical bug, but I don't have a good
> feeling for how it is showing up in productions. Does it result in
> some
> file transfers failing? What is the symptom of the failure? How often
> does it occur?

I was a little undecided as to where to place the criticality of the
bug - somewhere between major and critical. So far it has only been
seen at one production site, and it is difficult to reproduce on a
given girdftp2 instance. But we have a relatively small number of
DPMs using the new gridftp2, so the concern is that it would become
more frequent as the number of instances does.

It was a failure in FTS transfers from one site to another - it
showed up a problem just in transfers between those two particular
sites, the exact conditions required appearing to be complex; it
appears to depends on the size of range markers (and thus the size of
messages the server was writing on the control channel) and so the
transfer rate number of streams, tcp buffer sizes as well as the
configuration of the system where the globus gridtp2 is running. The
symptom was the error:

debug: error reading response from gsiftp://node26.datagrid.cea.fr/
node26.datagrid.cea.fr:/pool_node26/dteam/2007-09-25/david102d.
2397.0: globus_l_ftp_control_read_cb: Error while searching for end
of reply
debug: fault on connection to gsiftp://node26.datagrid.cea.fr/
node26.datagrid.cea.fr:/pool_node26/dteam/2007-09-25/david102d.
2397.0: globus_l_ftp_control_read_cb: Error while searching for end
of reply
debug: error reading response from gsiftp://ccxfer13.in2p3.fr:2811//
pnfs/in2p3.fr/data/dteam/disk/dapnia/cleroy/ATLAS-filenode20_007: an
I/O operation was cancelled
debug: operation complete

error: globus_l_ftp_control_read_cb: Error while searching for end of
reply

the above is the client side error I made which investigating. It was
reported by the ftp client that was handling the 3rd party copy, an
older globus 2 based client in this case. In production the result
is an FTS error, with a similar error to the above being available
from the FTS server when the transfer job status is queried.

Yours,
David

--
------------------------------------------------------------------------
-
David Smith e-mail: David.Smith@cern.ch tel: +41 22 76
70677
Address: D. Smith, CERN G06210, Bat 28 1-015, 1211 Geneva 23,
Switzerland
------------------------------------------------------------------------
-
Download smime.p7s
application/pkcs7-signature 4.2k
Download (untitled) / with headers
text/plain 643b
We got a comment from Michael Link (from Globus) today.

http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=5590
> That change looks good to me. I'll commit a fix for that and release
an update
> package soon.
>
> The changes you noted in the current CVS trunk have the side affect of
fixing
> the problem as well, but they were designed for better threaded
performance
> along with other changes throughout the package, so they aren't really
suitable
> for porting to the 4.0.x branch.
>
> Thanks for reporting this.

I'll see what his update contains, then we can produce an updated Globus
for VDT 1.6.1 and get it to you ASAP.

-alain
CC: litmaath@cern.ch, john.white@cern.ch, s.burke@rl.ac.uk
Subject: Re: [vdt-support #2989] [bug #29930] gridftp2 server can send truncated control channel messages
Date: Tue, 09 Oct 2007 11:06:36 +0200
To: vdt-support@OPENSCIENCEGRID.ORG
From: David Smith <David.Smith@cern.ch>
Download (untitled) / with headers
text/plain 1.5k

On Oct 9, 2007, at 12:17 AM, Alain Roy via RT wrote:

> We got a comment from Michael Link (from Globus) today.
>
> http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=5590
>> That change looks good to me. I'll commit a fix for that and release
> an update
>> package soon.
>>
>> The changes you noted in the current CVS trunk have the side
>> affect of
> fixing
>> the problem as well, but they were designed for better threaded
> performance
>> along with other changes throughout the package, so they aren't
>> really
> suitable
>> for porting to the 4.0.x branch.
>>
>> Thanks for reporting this.
>
> I'll see what his update contains, then we can produce an updated
> Globus
> for VDT 1.6.1 and get it to you ASAP.

Hello Alain,

That's great news. Thanks for following that up with globus (as well
as the subsequent work that is needed to make a VDT update).

By the way I opened another bug, savannah bug https://
savannah.cern.ch/bugs/?30106, which is also a globus issue. I only
describe that as 'normal' severity. But I wanted to check that you'd
seen that I copied vdt-support on it. It is probably another small,
localized fix - but again it will need to be checked with globus to
see if they agree or have other comments.

Yours.
David

--
------------------------------------------------------------------------
-
David Smith e-mail: David.Smith@cern.ch tel: +41 22 76
70677
Address: D. Smith, CERN G06210, Bat 28 1-015, 1211 Geneva 23,
Switzerland
------------------------------------------------------------------------
-
Download smime.p7s
application/pkcs7-signature 4.2k
Download (untitled) / with headers
text/plain 132b
This was released to EGEE as a 1.6.1 update, but not in the Pacman
1.6.1. It's in 1.8.1 (where it came as an update) and forward.
Subject: [bug #29930] gridftp2 server can send truncated control channel messages
Date: Fri, 18 Jan 2008 12:55:52 +0100
To: EGEE JRA1 Test Team <project-lcg-deployment-ct@cern.ch>, Maarten Litmaath <litmaath@cern.ch>, Juha Herrala <juha.herrala@cern.ch>, David Smith <David.Smith@cern.ch>, John White <john.white@cern.ch>, Stephen Burke <s.burke@rl.ac.uk>, vdt-support@OPENSCIENCEGRID.ORG
From: "noreply [Juha Herrala]" <noreply-savannah@cern.ch>
Download (untitled) / with headers
text/plain 4.3k
This is an automated notification sent by LCG Savannah.
It relates to:
bugs #29930, project gLite Middleware

==============================================================================
LATEST MODIFICATIONS of bugs #29930:
==============================================================================

Update of bug #29930 (project jra1mdw):

Status: Ready for Test => Ready for Review


==============================================================================
OVERVIEW of bugs #29930:
==============================================================================

URL:
<http://savannah.cern.ch/bugs/?29930>

Summary: gridftp2 server can send truncated control channel
messages
Project: gLite Middleware
Submitted by: dhsmith
Submitted on: 2007-09-27 15:18
Status: Ready for Review
Open/Closed: Open
Category: Globus
Severity: 6 - Critical
Baseline Release: Unknown
OS: None
Architecture: None
Bug detection area: Production
Assigned to: egeetest
Priority: 5 - Enhancement
GGUS reference URL: https://gus.fzk.de/ws/ticket_info.php?ticket=26868
Component tag(s):
Subsystem tag(s):
Discussion Lock: Any
Build environment: None
Release:

_______________________________________________________


Hello,

We recently had a problem reported by a site using a 64bit, SL4 release of
DPM (and hence the gridftp2 server) that was traced to an internal
globus/gridftp2 lib issue:

from VDT RPM vdt_globus_data_server-VDT1.6.0x86_64_rhas_4-3
in library(s) libglobus_gridftp_server_control_*

function globus_l_xio_gssapi_ftp_write(), from source file:
globus_xio_gssapi_ftp.c at line aprox 2590:

res = globus_xio_driver_pass_write(
op,
&handle->auth_write_iov,
1,
length,
cb,
handle);

here the argument 'length' refers to the length of the unwrapped
(unencrpyted) and binary (not base64) version of the message (whereas
handle->auth_write_iov.iov_len has the actual length); and thus 'length' is
about 75% of the actual length of the message that the function wants to
send. The forth argument to globus_xio_driver_pass_write() is the 'wait_for'
parameter and sets the minimum amount of data to have sent to the kernel
before the callback is called. Thus if the kernel accepts more than about 75%
of the length of the message, but less than all of it, the message is
trunctaed.

Browsing the globus CVS I noticed that recent versions/branches of
globus_xio_gssapi_ftp.c have been reworked and probably do not suffer from
this problem - although I don't know if exactly this problem was ever
recognised explictly. I don't know if there are any globus/gridftp2 releases
based on the new code, but it is likely we will find problems again from this
bug so it is very desirable to have this fixed - either as a fix to the older
version or by moving to the newer code.

Thank you,
David

_______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2007-10-01 09:02 By: Maarten Litmaath <maart>

Dear VDT Support,
please look into this bug considered critical for WLCG/EGEE.


-------------------------------------------------------
Date: 2007-09-28 08:53 By: David Smith <dhsmith>
Hello,

Moved to critical - as I expect this problem will lead to service instability
anywhere where we deploy gridftp2, which is an increasing number of the DPM
sites.

David





_______________________________________________________

Carbon-Copy List:

CC Address | Comment
------------------------------------+-----------------------------
522 | -UPD-
488 | -COM-
vdt-support@opensciencegrid.org |
868 | -UPD-
sburke |
645 | -SUB-




==============================================================================

This item URL is:
<http://savannah.cern.ch/bugs/?29930>

_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/
Subject: [bug #29930] gridftp2 server can send truncated control channel messages
Date: Fri, 18 Jan 2008 12:55:40 +0100
To: Maarten Litmaath <litmaath@cern.ch>, Juha Herrala <juha.herrala@cern.ch>, David Smith <David.Smith@cern.ch>, John White <john.white@cern.ch>, Stephen Burke <s.burke@rl.ac.uk>, vdt-support@OPENSCIENCEGRID.ORG
From: "noreply [Juha Herrala]" <noreply-savannah@cern.ch>
Download (untitled) / with headers
text/plain 4.3k
This is an automated notification sent by LCG Savannah.
It relates to:
bugs #29930, project gLite Middleware

==============================================================================
LATEST MODIFICATIONS of bugs #29930:
==============================================================================

Update of bug #29930 (project jra1mdw):

Status: Accepted => In progress


==============================================================================
OVERVIEW of bugs #29930:
==============================================================================

URL:
<http://savannah.cern.ch/bugs/?29930>

Summary: gridftp2 server can send truncated control channel
messages
Project: gLite Middleware
Submitted by: dhsmith
Submitted on: 2007-09-27 15:18
Status: In progress
Open/Closed: Open
Category: Globus
Severity: 6 - Critical
Baseline Release: Unknown
OS: None
Architecture: None
Bug detection area: Production
Assigned to: maart
Priority: 5 - Enhancement
GGUS reference URL: https://gus.fzk.de/ws/ticket_info.php?ticket=26868
Component tag(s):
Subsystem tag(s):
Discussion Lock: Any
Build environment: None
Release:

_______________________________________________________


Hello,

We recently had a problem reported by a site using a 64bit, SL4 release of
DPM (and hence the gridftp2 server) that was traced to an internal
globus/gridftp2 lib issue:

from VDT RPM vdt_globus_data_server-VDT1.6.0x86_64_rhas_4-3
in library(s) libglobus_gridftp_server_control_*

function globus_l_xio_gssapi_ftp_write(), from source file:
globus_xio_gssapi_ftp.c at line aprox 2590:

res = globus_xio_driver_pass_write(
op,
&handle->auth_write_iov,
1,
length,
cb,
handle);

here the argument 'length' refers to the length of the unwrapped
(unencrpyted) and binary (not base64) version of the message (whereas
handle->auth_write_iov.iov_len has the actual length); and thus 'length' is
about 75% of the actual length of the message that the function wants to
send. The forth argument to globus_xio_driver_pass_write() is the 'wait_for'
parameter and sets the minimum amount of data to have sent to the kernel
before the callback is called. Thus if the kernel accepts more than about 75%
of the length of the message, but less than all of it, the message is
trunctaed.

Browsing the globus CVS I noticed that recent versions/branches of
globus_xio_gssapi_ftp.c have been reworked and probably do not suffer from
this problem - although I don't know if exactly this problem was ever
recognised explictly. I don't know if there are any globus/gridftp2 releases
based on the new code, but it is likely we will find problems again from this
bug so it is very desirable to have this fixed - either as a fix to the older
version or by moving to the newer code.

Thank you,
David

_______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2007-10-01 09:02 By: Maarten Litmaath <maart>

Dear VDT Support,
please look into this bug considered critical for WLCG/EGEE.


-------------------------------------------------------
Date: 2007-09-28 08:53 By: David Smith <dhsmith>
Hello,

Moved to critical - as I expect this problem will lead to service instability
anywhere where we deploy gridftp2, which is an increasing number of the DPM
sites.

David





_______________________________________________________

Carbon-Copy List:

CC Address | Comment
------------------------------------+-----------------------------
522 | -UPD-
488 | -COM-
vdt-support@opensciencegrid.org |
868 | -UPD-
sburke |
645 | -SUB-




==============================================================================

This item URL is:
<http://savannah.cern.ch/bugs/?29930>

_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/
Subject: [bug #29930] gridftp2 server can send truncated control channel messages
Date: Fri, 18 Jan 2008 12:55:44 +0100
To: Maarten Litmaath <litmaath@cern.ch>, Juha Herrala <juha.herrala@cern.ch>, David Smith <David.Smith@cern.ch>, John White <john.white@cern.ch>, Stephen Burke <s.burke@rl.ac.uk>, vdt-support@OPENSCIENCEGRID.ORG
From: "noreply [Juha Herrala]" <noreply-savannah@cern.ch>
Download (untitled) / with headers
text/plain 4.3k
This is an automated notification sent by LCG Savannah.
It relates to:
bugs #29930, project gLite Middleware

==============================================================================
LATEST MODIFICATIONS of bugs #29930:
==============================================================================

Update of bug #29930 (project jra1mdw):

Status: In progress => Integration Candidate


==============================================================================
OVERVIEW of bugs #29930:
==============================================================================

URL:
<http://savannah.cern.ch/bugs/?29930>

Summary: gridftp2 server can send truncated control channel
messages
Project: gLite Middleware
Submitted by: dhsmith
Submitted on: 2007-09-27 15:18
Status: Integration Candidate
Open/Closed: Open
Category: Globus
Severity: 6 - Critical
Baseline Release: Unknown
OS: None
Architecture: None
Bug detection area: Production
Assigned to: maart
Priority: 5 - Enhancement
GGUS reference URL: https://gus.fzk.de/ws/ticket_info.php?ticket=26868
Component tag(s):
Subsystem tag(s):
Discussion Lock: Any
Build environment: None
Release:

_______________________________________________________


Hello,

We recently had a problem reported by a site using a 64bit, SL4 release of
DPM (and hence the gridftp2 server) that was traced to an internal
globus/gridftp2 lib issue:

from VDT RPM vdt_globus_data_server-VDT1.6.0x86_64_rhas_4-3
in library(s) libglobus_gridftp_server_control_*

function globus_l_xio_gssapi_ftp_write(), from source file:
globus_xio_gssapi_ftp.c at line aprox 2590:

res = globus_xio_driver_pass_write(
op,
&handle->auth_write_iov,
1,
length,
cb,
handle);

here the argument 'length' refers to the length of the unwrapped
(unencrpyted) and binary (not base64) version of the message (whereas
handle->auth_write_iov.iov_len has the actual length); and thus 'length' is
about 75% of the actual length of the message that the function wants to
send. The forth argument to globus_xio_driver_pass_write() is the 'wait_for'
parameter and sets the minimum amount of data to have sent to the kernel
before the callback is called. Thus if the kernel accepts more than about 75%
of the length of the message, but less than all of it, the message is
trunctaed.

Browsing the globus CVS I noticed that recent versions/branches of
globus_xio_gssapi_ftp.c have been reworked and probably do not suffer from
this problem - although I don't know if exactly this problem was ever
recognised explictly. I don't know if there are any globus/gridftp2 releases
based on the new code, but it is likely we will find problems again from this
bug so it is very desirable to have this fixed - either as a fix to the older
version or by moving to the newer code.

Thank you,
David

_______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2007-10-01 09:02 By: Maarten Litmaath <maart>

Dear VDT Support,
please look into this bug considered critical for WLCG/EGEE.


-------------------------------------------------------
Date: 2007-09-28 08:53 By: David Smith <dhsmith>
Hello,

Moved to critical - as I expect this problem will lead to service instability
anywhere where we deploy gridftp2, which is an increasing number of the DPM
sites.

David





_______________________________________________________

Carbon-Copy List:

CC Address | Comment
------------------------------------+-----------------------------
522 | -UPD-
488 | -COM-
vdt-support@opensciencegrid.org |
868 | -UPD-
sburke |
645 | -SUB-




==============================================================================

This item URL is:
<http://savannah.cern.ch/bugs/?29930>

_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/
Subject: [bug #29930] gridftp2 server can send truncated control channel messages
Date: Fri, 18 Jan 2008 12:55:35 +0100
To: Maarten Litmaath <litmaath@cern.ch>, Juha Herrala <juha.herrala@cern.ch>, David Smith <David.Smith@cern.ch>, John White <john.white@cern.ch>, Stephen Burke <s.burke@rl.ac.uk>, vdt-support@OPENSCIENCEGRID.ORG
From: "noreply [Juha Herrala]" <noreply-savannah@cern.ch>
Download (untitled) / with headers
text/plain 4.3k
This is an automated notification sent by LCG Savannah.
It relates to:
bugs #29930, project gLite Middleware

==============================================================================
LATEST MODIFICATIONS of bugs #29930:
==============================================================================

Update of bug #29930 (project jra1mdw):

Status: None => Accepted


==============================================================================
OVERVIEW of bugs #29930:
==============================================================================

URL:
<http://savannah.cern.ch/bugs/?29930>

Summary: gridftp2 server can send truncated control channel
messages
Project: gLite Middleware
Submitted by: dhsmith
Submitted on: 2007-09-27 15:18
Status: Accepted
Open/Closed: Open
Category: Globus
Severity: 6 - Critical
Baseline Release: Unknown
OS: None
Architecture: None
Bug detection area: Production
Assigned to: maart
Priority: 5 - Enhancement
GGUS reference URL: https://gus.fzk.de/ws/ticket_info.php?ticket=26868
Component tag(s):
Subsystem tag(s):
Discussion Lock: Any
Build environment: None
Release:

_______________________________________________________


Hello,

We recently had a problem reported by a site using a 64bit, SL4 release of
DPM (and hence the gridftp2 server) that was traced to an internal
globus/gridftp2 lib issue:

from VDT RPM vdt_globus_data_server-VDT1.6.0x86_64_rhas_4-3
in library(s) libglobus_gridftp_server_control_*

function globus_l_xio_gssapi_ftp_write(), from source file:
globus_xio_gssapi_ftp.c at line aprox 2590:

res = globus_xio_driver_pass_write(
op,
&handle->auth_write_iov,
1,
length,
cb,
handle);

here the argument 'length' refers to the length of the unwrapped
(unencrpyted) and binary (not base64) version of the message (whereas
handle->auth_write_iov.iov_len has the actual length); and thus 'length' is
about 75% of the actual length of the message that the function wants to
send. The forth argument to globus_xio_driver_pass_write() is the 'wait_for'
parameter and sets the minimum amount of data to have sent to the kernel
before the callback is called. Thus if the kernel accepts more than about 75%
of the length of the message, but less than all of it, the message is
trunctaed.

Browsing the globus CVS I noticed that recent versions/branches of
globus_xio_gssapi_ftp.c have been reworked and probably do not suffer from
this problem - although I don't know if exactly this problem was ever
recognised explictly. I don't know if there are any globus/gridftp2 releases
based on the new code, but it is likely we will find problems again from this
bug so it is very desirable to have this fixed - either as a fix to the older
version or by moving to the newer code.

Thank you,
David

_______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2007-10-01 09:02 By: Maarten Litmaath <maart>

Dear VDT Support,
please look into this bug considered critical for WLCG/EGEE.


-------------------------------------------------------
Date: 2007-09-28 08:53 By: David Smith <dhsmith>
Hello,

Moved to critical - as I expect this problem will lead to service instability
anywhere where we deploy gridftp2, which is an increasing number of the DPM
sites.

David





_______________________________________________________

Carbon-Copy List:

CC Address | Comment
------------------------------------+-----------------------------
522 | -UPD-
488 | -COM-
vdt-support@opensciencegrid.org |
868 | -UPD-
sburke |
645 | -SUB-




==============================================================================

This item URL is:
<http://savannah.cern.ch/bugs/?29930>

_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/
Subject: [bug #29930] gridftp2 server can send truncated control channel messages
Date: Fri, 18 Jan 2008 12:55:48 +0100
To: EGEE JRA1 Test Team <project-lcg-deployment-ct@cern.ch>, Maarten Litmaath <litmaath@cern.ch>, Juha Herrala <juha.herrala@cern.ch>, David Smith <David.Smith@cern.ch>, John White <john.white@cern.ch>, Stephen Burke <s.burke@rl.ac.uk>, vdt-support@OPENSCIENCEGRID.ORG
From: "noreply [Juha Herrala]" <noreply-savannah@cern.ch>
Download (untitled) / with headers
text/plain 4.3k
This is an automated notification sent by LCG Savannah.
It relates to:
bugs #29930, project gLite Middleware

==============================================================================
LATEST MODIFICATIONS of bugs #29930:
==============================================================================

Update of bug #29930 (project jra1mdw):

Status: Integration Candidate => Ready for Test


==============================================================================
OVERVIEW of bugs #29930:
==============================================================================

URL:
<http://savannah.cern.ch/bugs/?29930>

Summary: gridftp2 server can send truncated control channel
messages
Project: gLite Middleware
Submitted by: dhsmith
Submitted on: 2007-09-27 15:18
Status: Ready for Test
Open/Closed: Open
Category: Globus
Severity: 6 - Critical
Baseline Release: Unknown
OS: None
Architecture: None
Bug detection area: Production
Assigned to: egeetest
Priority: 5 - Enhancement
GGUS reference URL: https://gus.fzk.de/ws/ticket_info.php?ticket=26868
Component tag(s):
Subsystem tag(s):
Discussion Lock: Any
Build environment: None
Release:

_______________________________________________________


Hello,

We recently had a problem reported by a site using a 64bit, SL4 release of
DPM (and hence the gridftp2 server) that was traced to an internal
globus/gridftp2 lib issue:

from VDT RPM vdt_globus_data_server-VDT1.6.0x86_64_rhas_4-3
in library(s) libglobus_gridftp_server_control_*

function globus_l_xio_gssapi_ftp_write(), from source file:
globus_xio_gssapi_ftp.c at line aprox 2590:

res = globus_xio_driver_pass_write(
op,
&handle->auth_write_iov,
1,
length,
cb,
handle);

here the argument 'length' refers to the length of the unwrapped
(unencrpyted) and binary (not base64) version of the message (whereas
handle->auth_write_iov.iov_len has the actual length); and thus 'length' is
about 75% of the actual length of the message that the function wants to
send. The forth argument to globus_xio_driver_pass_write() is the 'wait_for'
parameter and sets the minimum amount of data to have sent to the kernel
before the callback is called. Thus if the kernel accepts more than about 75%
of the length of the message, but less than all of it, the message is
trunctaed.

Browsing the globus CVS I noticed that recent versions/branches of
globus_xio_gssapi_ftp.c have been reworked and probably do not suffer from
this problem - although I don't know if exactly this problem was ever
recognised explictly. I don't know if there are any globus/gridftp2 releases
based on the new code, but it is likely we will find problems again from this
bug so it is very desirable to have this fixed - either as a fix to the older
version or by moving to the newer code.

Thank you,
David

_______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2007-10-01 09:02 By: Maarten Litmaath <maart>

Dear VDT Support,
please look into this bug considered critical for WLCG/EGEE.


-------------------------------------------------------
Date: 2007-09-28 08:53 By: David Smith <dhsmith>
Hello,

Moved to critical - as I expect this problem will lead to service instability
anywhere where we deploy gridftp2, which is an increasing number of the DPM
sites.

David





_______________________________________________________

Carbon-Copy List:

CC Address | Comment
------------------------------------+-----------------------------
522 | -UPD-
488 | -COM-
vdt-support@opensciencegrid.org |
868 | -UPD-
sburke |
645 | -SUB-




==============================================================================

This item URL is:
<http://savannah.cern.ch/bugs/?29930>

_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/
Subject: [bug #29930] gridftp2 server can send truncated control channel messages
Date: Wed, 13 Feb 2008 10:04:43 +0100
To: EGEE JRA1 Test Team <project-lcg-deployment-ct@cern.ch>, Maarten Litmaath <litmaath@cern.ch>, Juha Herrala <juha.herrala@cern.ch>, David Smith <David.Smith@cern.ch>, FROHNER Akos <Akos.Frohner@cern.ch>, John White <john.white@cern.ch>, Stephen Burke <s.burke@rl.ac.uk>, vdt-support@OPENSCIENCEGRID.ORG
From: "noreply [FROHNER Akos]" <noreply-savannah@cern.ch>
Download (untitled) / with headers
text/plain 4.5k
This is an automated notification sent by LCG Savannah.
It relates to:
bugs #29930, project gLite Middleware

==============================================================================
LATEST MODIFICATIONS of bugs #29930:
==============================================================================

Follow-up Comment #3, bug #29930 (project jra1mdw):

Fixed in VDT 1.6.1i:
http://vdt.cs.wisc.edu/releases/1.6.1/release.html

==============================================================================
OVERVIEW of bugs #29930:
==============================================================================

URL:
<http://savannah.cern.ch/bugs/?29930>

Summary: gridftp2 server can send truncated control channel
messages
Project: gLite Middleware
Submitted by: dhsmith
Submitted on: 2007-09-27 17:18
Status: Ready for Review
Open/Closed: Open
Category: Globus
Severity: 6 - Critical
Baseline Release: Unknown
OS: None
Architecture: None
Bug detection area: Production
Assigned to: egeetest
Priority: 5 - Enhancement
GGUS reference URL: https://gus.fzk.de/ws/ticket_info.php?ticket=26868
Component tag(s):
Subsystem tag(s):
Discussion Lock: Any
Build environment: None
Release:

_______________________________________________________


Hello,

We recently had a problem reported by a site using a 64bit, SL4 release of
DPM (and hence the gridftp2 server) that was traced to an internal
globus/gridftp2 lib issue:

from VDT RPM vdt_globus_data_server-VDT1.6.0x86_64_rhas_4-3
in library(s) libglobus_gridftp_server_control_*

function globus_l_xio_gssapi_ftp_write(), from source file:
globus_xio_gssapi_ftp.c at line aprox 2590:

res = globus_xio_driver_pass_write(
op,
&handle->auth_write_iov,
1,
length,
cb,
handle);

here the argument 'length' refers to the length of the unwrapped
(unencrpyted) and binary (not base64) version of the message (whereas
handle->auth_write_iov.iov_len has the actual length); and thus 'length' is
about 75% of the actual length of the message that the function wants to
send. The forth argument to globus_xio_driver_pass_write() is the 'wait_for'
parameter and sets the minimum amount of data to have sent to the kernel
before the callback is called. Thus if the kernel accepts more than about 75%
of the length of the message, but less than all of it, the message is
trunctaed.

Browsing the globus CVS I noticed that recent versions/branches of
globus_xio_gssapi_ftp.c have been reworked and probably do not suffer from
this problem - although I don't know if exactly this problem was ever
recognised explictly. I don't know if there are any globus/gridftp2 releases
based on the new code, but it is likely we will find problems again from this
bug so it is very desirable to have this fixed - either as a fix to the older
version or by moving to the newer code.

Thank you,
David

_______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2008-02-13 10:04 By: FROHNER Akos <szamsu>
Fixed in VDT 1.6.1i:
http://vdt.cs.wisc.edu/releases/1.6.1/release.html

-------------------------------------------------------
Date: 2007-10-01 11:02 By: Maarten Litmaath <maart>

Dear VDT Support,
please look into this bug considered critical for WLCG/EGEE.


-------------------------------------------------------
Date: 2007-09-28 10:53 By: David Smith <dhsmith>
Hello,

Moved to critical - as I expect this problem will lead to service instability
anywhere where we deploy gridftp2, which is an increasing number of the DPM
sites.

David





_______________________________________________________

Carbon-Copy List:

CC Address | Comment
------------------------------------+-----------------------------
667 | -COM-
522 | -UPD-
488 | -COM-
vdt-support@opensciencegrid.org |
868 | -UPD-
sburke |
645 | -SUB-




==============================================================================

This item URL is:
<http://savannah.cern.ch/bugs/?29930>

_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/
Subject: [bug #29930] gridftp2 server can send truncated control channel messages
Date: Wed, 20 Feb 2008 10:10:43 +0100
To: EGEE JRA1 Test Team <project-lcg-deployment-ct@cern.ch>, Maarten Litmaath <litmaath@cern.ch>, Juha Herrala <juha.herrala@cern.ch>, David Smith <David.Smith@cern.ch>, FROHNER Akos <Akos.Frohner@cern.ch>, John White <john.white@cern.ch>, Stephen Burke <s.burke@rl.ac.uk>, vdt-support@OPENSCIENCEGRID.ORG
From: "noreply [David Smith]" <noreply-savannah@cern.ch>
Download (untitled) / with headers
text/plain 5.6k
This is an automated notification sent by LCG Savannah.
It relates to:
bugs #29930, project gLite Middleware

==============================================================================
LATEST MODIFICATIONS of bugs #29930:
==============================================================================

Update of bug #29930 (project jra1mdw):

Status: Ready for Review => Fixed

_______________________________________________________

Follow-up Comment #4:

Hello,

We've verified the relevant globus CVS change is reflected in

vdt_globus_data_server-VDT1.6.1x86_64_rhas_4-6

and that manually (i.e. with a debugger) provoking the original condition
associated with the problem now gives good behavior. The site which was
experiencing the fault is no longer seeing any problem - even with the old
software. Never the less I believe this bug can considered fixed now.

Thanks to those involved in getting this change made.

David

==============================================================================
OVERVIEW of bugs #29930:
==============================================================================

URL:
<http://savannah.cern.ch/bugs/?29930>

Summary: gridftp2 server can send truncated control channel
messages
Project: gLite Middleware
Submitted by: dhsmith
Submitted on: 2007-09-27 17:18
Status: Fixed
Open/Closed: Closed
Category: Globus
Severity: 6 - Critical
Baseline Release: Unknown
OS: None
Architecture: None
Bug detection area: Production
Assigned to: egeetest
Priority: 5 - Enhancement
GGUS reference URL: https://gus.fzk.de/ws/ticket_info.php?ticket=26868
Component tag(s):
Subsystem tag(s):
Discussion Lock: Any
Build environment: None
Release:

_______________________________________________________


Hello,

We recently had a problem reported by a site using a 64bit, SL4 release of
DPM (and hence the gridftp2 server) that was traced to an internal
globus/gridftp2 lib issue:

from VDT RPM vdt_globus_data_server-VDT1.6.0x86_64_rhas_4-3
in library(s) libglobus_gridftp_server_control_*

function globus_l_xio_gssapi_ftp_write(), from source file:
globus_xio_gssapi_ftp.c at line aprox 2590:

res = globus_xio_driver_pass_write(
op,
&handle->auth_write_iov,
1,
length,
cb,
handle);

here the argument 'length' refers to the length of the unwrapped
(unencrpyted) and binary (not base64) version of the message (whereas
handle->auth_write_iov.iov_len has the actual length); and thus 'length' is
about 75% of the actual length of the message that the function wants to
send. The forth argument to globus_xio_driver_pass_write() is the 'wait_for'
parameter and sets the minimum amount of data to have sent to the kernel
before the callback is called. Thus if the kernel accepts more than about 75%
of the length of the message, but less than all of it, the message is
trunctaed.

Browsing the globus CVS I noticed that recent versions/branches of
globus_xio_gssapi_ftp.c have been reworked and probably do not suffer from
this problem - although I don't know if exactly this problem was ever
recognised explictly. I don't know if there are any globus/gridftp2 releases
based on the new code, but it is likely we will find problems again from this
bug so it is very desirable to have this fixed - either as a fix to the older
version or by moving to the newer code.

Thank you,
David

_______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: 2008-02-20 10:10 By: David Smith <dhsmith>
Hello,

We've verified the relevant globus CVS change is reflected in

vdt_globus_data_server-VDT1.6.1x86_64_rhas_4-6

and that manually (i.e. with a debugger) provoking the original condition
associated with the problem now gives good behavior. The site which was
experiencing the fault is no longer seeing any problem - even with the old
software. Never the less I believe this bug can considered fixed now.

Thanks to those involved in getting this change made.

David

-------------------------------------------------------
Date: 2008-02-13 10:04 By: FROHNER Akos <szamsu>
Fixed in VDT 1.6.1i:
http://vdt.cs.wisc.edu/releases/1.6.1/release.html

-------------------------------------------------------
Date: 2007-10-01 11:02 By: Maarten Litmaath <maart>

Dear VDT Support,
please look into this bug considered critical for WLCG/EGEE.


-------------------------------------------------------
Date: 2007-09-28 10:53 By: David Smith <dhsmith>
Hello,

Moved to critical - as I expect this problem will lead to service instability
anywhere where we deploy gridftp2, which is an increasing number of the DPM
sites.

David





_______________________________________________________

Carbon-Copy List:

CC Address | Comment
------------------------------------+-----------------------------
667 | -COM-
522 | -UPD-
488 | -COM-
vdt-support@opensciencegrid.org |
868 | -UPD-
sburke |
645 | -SUB-




==============================================================================

This item URL is:
<http://savannah.cern.ch/bugs/?29930>

_______________________________________________
Message sent via/by LCG Savannah
http://savannah.cern.ch/
Download (untitled) / with headers
text/plain 1.6k
Great! We'll move this into new versions of the VDT as well then.

Thanks,
-alain

> We've verified the relevant globus CVS change is reflected in
>
> vdt_globus_data_server-VDT1.6.1x86_64_rhas_4-6
>
> and that manually (i.e. with a debugger) provoking the original
> condition
> associated with the problem now gives good behavior. The site which
> was
> experiencing the fault is no longer seeing any problem - even with the
> old
> software. Never the less I believe this bug can considered fixed now.
>
> Thanks to those involved in getting this change made.
>
> David
>
>
==============================================================================
> OVERVIEW of bugs #29930:
>
==============================================================================
>
> URL:
> <http://savannah.cern.ch/bugs/?29930>
>
> Summary: gridftp2 server can send truncated control
> channel
> messages
> Project: gLite Middleware
> Submitted by: dhsmith
> Submitted on: 2007-09-27 17:18
> Status: Fixed
> Open/Closed: Closed
> Category: Globus
> Severity: 6 - Critical
> Baseline Release: Unknown
> OS: None
> Architecture: None
> Bug detection area: Production
> Assigned to: egeetest
> Priority: 5 - Enhancement
> GGUS reference URL:
> https://gus.fzk.de/ws/ticket_info.php?ticket=26868
> Component tag(s):
> Subsystem tag(s):
> Discussion Lock: Any
> Build environment: None
> Release: