Uploaded image for project: 'StreamSets Data Collector'
  1. StreamSets Data Collector
  2. SDC-15725

SFTP Origin does not handle idle case

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Resolved
    • Priority: P3 (Limited Impact)
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.19.0
    • Component/s: None
    • Labels:
    • Testing Status:
      Manual Testing
    • Testing Description:
      Manually tested that the stage idly waits for a certain amount of time (Batch Wait Time (ms)) instead of actively pooling SFTP.
    • Team:
      Data Plane

      Description

      I have a simple SFTP Origin to Trash pipeline (attached).  A Linux mint 20 environment is running locally on my Mac in Virtualbox.  Vsftpd on Linux mint.

      I am running SDC 3.17.0 locally on my mac. 

      If there are no files for SFTP to process in my home directory in the Linux VM, it looks like SFTP Origin polls about 20 times a second.  - There does not seem to be a delay when there is no work to do. 

      Since this pipeline writes offset.json after every batch, it is a large number of IOPS.  This leads to the actual point.   If someone deploys this on a GCP VM they will have to pay for a lot of IOPS and SSD to support these pipelines when they are idle. 

      in the attached file, I added 1 line of trace in the produce() routine. 

       

        Attachments

          Activity

            People

            Assignee:
            sebas Sebastian Sanchez
            Reporter:
            bob bob plotts
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: