Uploaded image for project: 'StreamSets Data Collector'
  1. StreamSets Data Collector
  2. SDC-15919

Failed to get ctime: Directory spooler concatenates directory path twice

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: P4 (Minor Impact)
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.20.0
    • Component/s: None
    • Labels:
    • Environment:
      • SDC 3.18.1
    • Testing Status:
      STF Testing Required
    • Team:
      Data Plane
    • Stage:
      Hadoop FS Standalone origin
    • STF Test:

      Description

       

      In Hadoop FS Standalone origin, configured 

      Files Directory -> /tmp/test/*_test/

      Read Order -> Last Modified Timestamp

      In hdfs, the source directory looks as below:

      [root@node-1 tmp]# hdfs dfs -ls -R /tmp/test
      drwxrwxrwx   - root supergroup          0 2020-10-06 09:59 /tmp/test/ranjith_test
      -rwxrwxrwx   1 root supergroup         18 2020-10-06 09:59 /tmp/test/ranjith_test/ranjith_test.csv
      [root@node-1 tmp]#
      

      Metrics and destination show that the file is read correctly but at the same time I see some errors in the logs:

      2020-10-06 10:01:15,438 [user:*admin] [pipeline:hadoop origin/hadooporieab7bf88-3aab-4115-abcd-b61c98b36786] [runner:] [thread:Spool Directory Runner - 0] ERROR SpoolDirUtil - Failed to get ctime: 'ranjith_test.csv'java.io.FileNotFoundException: File does not exist: /tmp/test/*_test/tmp/test/ranjith_test/ranjith_test.csv 
       at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500) 
       at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493) 
       at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
       at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508) 
       at com.streamsets.pipeline.stage.origin.hdfs.spooler.HdfsFileSystem.getLastModifiedTime(HdfsFileSystem.java:89) 
       at com.streamsets.pipeline.lib.dirspooler.SpoolDirUtil.compareFiles(SpoolDirUtil.java:51)

       If you observe the path in the exception (i.e, /tmp/test/*_test/tmp/test/ranjith_test/ranjith_test.csv), you can see the path twice.

      I see that similar issue already discussed in this Jira https://issues.streamsets.com/browse/SDC-12359
      I am able to replicate this issue in SDC 3.16.0 also.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              konstantin Konstantin Golub
              Reporter:
              ranjith Ranjith Pulluru
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: