Document Scope: a VMS/OpenVMS application programmer's view of text storage
On a DCL session (VMS or OpenVMS) use EDT (edit/edt) or EVE (edit/tpu) to create a 10-line RMS-based text file that looks like
this. Make sure you have no trailing spaces, no embedded control characters, and no blank lines (so do not hit <enter> after
typing in the last line; just save then exit).
p.s. for this first demo I will use the DCL command "create" in case you do not know how to use either EDIT/EDT or EDIT/EVE
Legend: <ur> = user response <sr> = system response <enter> = hit the enter key <ctrl z> = hit control Z ----------------------------------- <sr> $ <ur> create yada.txt<enter> 1234567890<enter> 123456789<enter> 12345678<enter> 1234567<enter> 123456<enter> 12345<enter> 1234<enter> 123<enter> 12<enter> 1<ctrl z> <sr> Exit $
Now inspect file attributes using the DLC command: DIRECTORY/FULL
Notice that the record format contains the word "variable" but not "stream". This means that your software will view the contents
of this file using the RMS (Record Management Services) library routines built into OpenVMS (this additional processing is usually
not performed by C/C++ programs which can be a source of confusion on VMS/OpenVMS systems)
<sr> $ <ur> dir/full yada.txt <sr> Directory CSMIS$USER3:[ADMCSM.NEIL] YADA.TXT;5 File ID: (320,23,0) Size: 1/9 Owner: [NEIL] Created: 2-JAN-2005 14:54:05.35 Revised: 2-JAN-2005 14:54:05.40 (2) Expires: <None specified> Backup: <No backup recorded> Effective: <None specified> Recording: <None specified> Accessed: <None specified> Attributes: <None specified> Modified: <None specified> Linkcount: 1 File organization: Sequential Shelved state: Online Caching attribute: Writethrough File attributes: Allocation: 9, Extend: 0, Global buffer count: 0, No version limit Record format: Variable length, maximum 255 bytes, longest 10 bytes See note #1 Record attributes: Carriage return carriage control See note #2 RMS attributes: None Journaling enabled: None File protection: System:RWED, Owner:RWED, Group:RWED, World:RWE Access Cntrl List: None Client attributes: None Total of 1 file, 1/9 blocks. Notes: 1. Variable means each record uses a length indicator 2. Means RMS will append <cr> and <lf> to each record after retrieval
Inspect other file attributes by using ANALYZE/RMS
<sr> $ <ur> ana/rms yada.txt <sr> Check RMS File Integrity 2-JAN-2005 14:58:39.29 Page 1 CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5 FILE HEADER File Spec: CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5 File ID: (320,23,0) Owner UIC: [NEIL] Protection: System: RWED, Owner: RWED, Group: RWED, World: RWE Creation Date: 2-JAN-2005 14:54:05.35 Revision Date: 2-JAN-2005 14:54:05.40, Number: 2 Expiration Date: none specified Backup Date: none posted Contiguity Options: none Performance Options: none Reliability Options: none Journaling Enabled: none RMS FILE ATTRIBUTES File Organization: sequential Record Format: variable Record Attributes: carriage-return Maximum Record Size: 255 Longest Record: 10 Blocks Allocated: 9, Default Extend Size: 0 End-of-File VBN: 1, Offset: %X'0050' 80 See note #1 File Monitoring: disabled File Length Hint (Record Count): 10 See note #2 File Length Hint (Data Byte Count): 55 See note #3 Global Buffer Count: 0 The analysis uncovered NO errors. ANA/RMS YADA.TXT Notes: 1. this file's EOF marker is at byte # 80 2. this is the number of lines in my file 3. this is the actual stored byte count without padding, length counts, etc.
Now use the DCL command "DUMP" to see how your data was stored in the RMS file on disk
<sr> $
<ur> dump yada.txt
<sr> Dump of file CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5 on 2-JAN-2005 14:54:57.07
File ID (320,23,0) End of file block 1 / Allocated 9
Virtual block number 1 (00000001), 512 (0200) bytes
<<<--- read this way ---|--- read this way --->>>
36353433 32310008 00393837 36353433 32310009 30393837 36353433 3231000A ..1234567890..123456789...123456 000000
32310004 00353433 32310005 36353433 32310006 00373635 34333231 00073837 78..1234567...123456..12345...12 000020
00000000 00000000 00000000 0000FFFF 00310001 32310002 00333231 00033433 34..123...12..1................. 000040
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000060
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000080
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000A0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000C0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000E0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000100
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000120
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000140
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000160
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000180
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001A0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001C0
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001E0
0008 0009 0010 <--- record length in bytes (not including padding) 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 <--- data characters 00 <--- padding to word-align length data -------- -------- -------- -------- -------- -------- -------- -------- --------------------------------------- 36353433 32310008 00393837 36353433 32310009 30393837 36353433 3231000A ..1234567890..123456789...123456 000000 32310004 00353433 32310005 36353433 32310006 00373635 34333231 00073837 78..1234567...123456..12345...12 000020 00000000 00000000 00000000 0000FFFF 00310001 32310002 00333231 00033433 34..123...12..1................. 000040 00 00 <--- padding to word-align length data 1 2 1 3 2 1 4 3 <--- data characters 0001 0002 0003 <--- record length in bytes (not including padding) FFFF <--- \ FFFF and null to EOF means... 00000000 00000000 00000000 0000 <--- / ...no more data
Now use EDIT/EDT or EDIT/EVE to create a second text file on VMS or OpenVMS.
executing ANA/RMS on this file shows the EOF marker at $10 (16) where you see four blue 'F' characters (put there by the editor, not RMS)
00010000 0001 0001 <--- record length in bytes (not including padding) ++--------------------------- "A" || ++------------- <bel> || || ++---- <nul> 00 00 00 <--- padding to word-align the length data 00000000 00000000 00000000 0000FFFF <--- means no more data -------- -------- -------- -------- -------- -------- -------- -------- --------------------------------------- 00000000 00000000 00000000 0000FFFF 00000041 00010000 00070001 00000001 ............A................... 000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000020
! first we need a foreign command (two are provided)
$ rfmvar :== convert/fdl=nla0:
$ rfmstmlf :== convert/fdl="""record; format stream_lf"""
! now we use the foreign command to convert the file to stream_lf
$ rfmstmlf targetfile.txt
<sr> $ <ur> cre stream_lf.dat<enter> ! create a file <ctrl-Z> <sr> $ <ur> set file stream_lf.dat /attr=(rfm:stmlf,lrl:32767,mrs:0,rat:cr) ! DCL cmd to set stream=lf with <cr> records <sr> $ <ur> ana/rms/fdl/output=stream_lf.fdl stream_lf.dat ! create an FDL <sr> $ <ur> convert/create/fdl=stream_lf.fdl yada.txt yada_lf.txt ! convert previous file into stream lf <sr> $ <ur> dump yada_lf.txt ! dump file to terminal in ASCII and hex <sr> $ <ur> ana/rms yada_lf.txt ! analyze resultant file <sr> Check RMS File Integrity 3-JAN-2005 07:02:57.60 Page 1 CSMIS$USER3:[ADMCSM.NEIL]yada_lf.TXT;5 FILE HEADER File Spec: CSMIS$USER3:[ADMCSM.NEIL]yada_lf.TXT;5 File ID: (479,34,0) Owner UIC: [NEIL] Protection: System: RWED, Owner: RWED, Group: RWED, World: RWE Creation Date: 3-JAN-2005 00:20:23.92 Revision Date: 3-JAN-2005 00:20:23.96, Number: 2 Expiration Date: none specified Backup Date: none posted Contiguity Options: none Performance Options: none Reliability Options: none Journaling Enabled: none RMS FILE ATTRIBUTES File Organization: sequential Record Format: stream-LF Note: means each record is terminated with <lf> Record Attributes: carriage-return Note: means add a <cr> to each record after retrieval Maximum Record Size: 255 Longest Record: 10 Blocks Allocated: 9, Default Extend Size: 0 End-of-File VBN: 1, Offset: %X'0041' 65 Note: this file's EOF marker is at byte # 65 File Monitoring: disabled Global Buffer Count: 0 The analysis uncovered NO errors. ANA/RMS yada_lf.TXT
$dump yada_lf.txt ++------------------++---------------------++---------------------- <lf> 2 1 8 7 6 5 4 3 2 1 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 <--- data characters 32310A38 37363534 3332310A 39383736 35343332 310A3039 38373635 34333231 1234567890.123456789.12345678.12 000000 310A3231 0A333231 0A343332 310A3534 3332310A 36353433 32310A37 36353433 34567.123456.12345.1234.123.12.1 000020 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000A ................................ 000040 5 4 3 2 1 <--- ASCII data characters ++ <lf> ++ EOF is also located at byte 0X47
It’s supposed to be “État du texte”. 1234.
Do a "directory/full" command to see the file's attributes
<sr> $ <ur> dir/full utf8-test.txt <sr> Directory CSMIS$USER3:[ADMCSM.NEIL] utf8-test.txt;1 File ID: (5683,193,0) Size: 1/16 Owner: [NEIL] Created: 4-APR-2017 05:24:06.38 Revised: 4-APR-2017 05:32:33.90 (4) Expires: <None specified> Backup: <No backup recorded> Effective: <None specified> Recording: <None specified> Accessed: <None specified> Attributes: 4-APR-2017 05:32:33.90 Modified: 4-APR-2017 05:24:06.38 Linkcount: 1 File organization: Sequential Shelved state: Online Caching attribute: Writethrough File attributes: Allocation: 16, Extend: 0, Global buffer count: 0, No version limit Record format: Stream_LF, maximum 0 bytes, longest 32767 bytes Record attributes: Carriage return carriage control RMS attributes: None Journaling enabled: None File protection: System:RWD, Owner:RWD, Group:RWD, World:RWD Access Cntrl List: None Client attributes: None Total of 1 file, 1/16 blocks. $Do an "analysis/rms" command to see the file's attributes including the EOF position
<ur> ana/rms utf8-test.txt
<sr> Check RMS File Integrity 4-APR-2017 05:53:37.24 Page 1
CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1
FILE HEADER
File Spec: CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1
File ID: (5683,193,0)
Owner UIC: [NEIL]
Protection: System: RWD, Owner: RWD, Group: RWD, World: RWD
Creation Date: 4-APR-2017 05:24:06.38
Revision Date: 4-APR-2017 05:32:33.90, Number: 4
Expiration Date: none specified
Backup Date: none posted
Contiguity Options: none
Performance Options: none
Reliability Options: none
Journaling Enabled: none
RMS FILE ATTRIBUTES
File Organization: sequential
Record Format: stream-LF
Record Attributes: carriage-return
Maximum Record Size: 0
Longest Record: 32767
Blocks Allocated: 16, Default Extend Size: 0
End-of-File VBN: 1, Offset: %X'0033' Note: EOF is found in block-1 at position 41
File Monitoring: disabled
Global Buffer Count pre-V8.3: 0
Global Buffer Count post-V8.3: 0
Global Buffer Flags post-V8.3: none
The analysis uncovered NO errors.
ANA/RMS utf8-test.txt
$
Do a plain "analysis" command to analyze the file and test the contents<ur> ana utf8-test.txt <sr> Analyze Object File 4-APR-2017 05:53:29.0 Page 1 CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1 ANALYZ I01-55 *** Object record 1 contains invalid type code 73: 73=x49 so why complain? 7 6 5 4 3 2 1 0 01234567 ------------------------ -------- 73 20 73 99 80 E2 74 49| 0000 |Itâ..s s| 20 64 65 73 6F 70 70 75| 0008 |upposed | 80 E2 20 65 62 20 6F 74| 0010 |to be â.| 64 20 74 61 74 89 C3 9C| 0018 |.Ã.tat d| E2 65 74 78 65 74 20 75| 0020 |u texteâ| 2E 9D 80| 0028 |... | *** Object record 2 contains invalid type code 49: 49=x31 so why complain? 7 6 5 4 3 2 1 0 01234567 ------------------------ -------- 2E 34 33 32 31| 0000 |1234. | *** Object record 3 has a length of zero. Analyze Object File 4-APR-2017 05:53:29.0 Page 2 CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1 ANALYZ I01-55 SUMMARY STATISTICS: Record Type Count Total Bytes OBJ$C_DBG 0 0 OBJ$C_TBT 0 0 EOBJ$C_EMH 0 0 EOBJ$C_EEOM 0 0 EOBJ$C_EGSD 0 0 EOBJ$C_ETIR 0 0 EOBJ$C_EDBG 0 0 EOBJ$C_ETBT 0 0 Totals 0 0 The analysis uncovered 3 errors. (not true) ANA utf8-test.txt $
Here is a short table of the special codes we expect to find in the file dump:
Character | Unicode code point | UTF-8 equivalent |
---|---|---|
’ | x2019 | e2 80 99 |
“ | x201c | e2 80 9c |
É | xc9 | c3 89 |
” | x201d | e2 80 9d |
<lf> | x0a | 0a |
Optionally, do a "dump" command
<ur> dump utf8-test.txt <sr> Dump of file CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1 on 4-APR-2017 06:02:40.62 File ID (5683,193,0) End of file block 1 / Allocated 16 Virtual block number 1 (00000001), 512 (0200) bytes <-- bytes | text --> 64207461 7489C39C 80E22065 62206F74 20646573 6F707075 73207399 80E27449 Itâ..s supposed to be â..Ã.tat d 000000 00000000 00000000 00000000 000A0A2E 34333231 0A2E9D80 E2657478 65742075 u texteâ....1234................ 000020 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000040 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000060 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000080 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000A0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000E0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000100 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000120 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000140 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000160 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000180 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001A0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001C0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001E0 $
Just about every operating system uses its own peculiar way to store text data.
Text is stored in files using these two data formats:
Format | Notes |
---|---|
ASCII | |
EBCDIC | seen on older IBM mainframes and IBM minicomputers |
They each employ one of these EOL (end-of-line) markers:
EOL Marker | Notes |
---|---|
<cr> | Seen in DOS |
<lf> | Seen in older UNIX systems |
<cr><lf> | Seen in Windows and newer UNIX systems |
<lf><cr> | |
<ctrl-Z> | Seen in some CP/M systems |
<ctrl-^> | Seen in older QNX systems |
If you don't believe me then consider the following problem often seen on Windows platforms. Opening a text file with NOTEPAD work intermittently but if you see junk on the screen then reopening with WORDPAD almost always works. How can this be? Well, the authors of WORDPAD put some special logic into their app to take care of foreign-formatted text files. Excel can do this too when importing data from text files containing either CSV or XML data.
Back in the day, the people who invented FTP were aware of this problem and so developed ASC (ASCII) Transfer Mode for handling text files. When an FTP connection is placed into ASC mode
HPFM! (Hocus Pocus - Frickin Magic)
Unfortunately, the people who developed SFTP (FTP over SSH/SSH2) do everything as a binary transfer. This means that some files SFTP'd onto a VMS/OpenVMS may require some post-transfer processing.
References: