File Transfer Protocol
FTP stands for File Transfer Protocol and is one of the easiest and securest methods through which files may be exchanged over the Internet. When a file is being downloaded from the Internet, it means that it is actually being transferred from another computer over the internet to the computer which requested for the download. (FTPplanet, n.d.) The broad objectives of FTP include the help to support file sharing, supporting implied utilization of remote computers through programs, preventing dissimilarity in the system of file storage between different hosts, and moving data over the internet in a secure and competent manner. (Postel & Reynolds, 1985) FTP files are accessed through their URL or internet address, where in most cases, it is the only knowledge that a person has about the file and is not even aware of where the file is coming from. An FTP address resembles a Website address with the difference in the use of FTP:// in place of HTTP://. (FTPplanet, n.d.)
Usually, a computer that has an FTP address is meant and designed for acceptance of an FTP connection and is called an FTP server or FTP site. An FTP connection is established by using a typical Web browser like Netscape or a program meant specifically for FTP software, called FTP client. An FTP Client refers to software that is intended for a two-way file transfer over the Internet in between two computers and is installed on a computer but works only in the presence of a live internet connection. (FTPplanet, n.d.)
Working of FTP
FTP is the application-layer protocol of TCP/IP, which is used for transporting file data from one computer over a network to another one. It applies two ports in its communication; port 20 is for the transfer of data, whereas port 21 is for commands, by default. Due to the use of TCP in its communications, FTP provides a reliable channel which is desirable, especially in cases when data is being moved around the internet. (FTP, n.d.) TCP/IP was created by a Department of Defense (DOD) research project in order to bind various networks of different specifications into a single large network of all the smaller networks called the Internet. TCP assumes that the correct data is delivered from client to server, where the client is the system that makes a request and server the server implies the place where the files are placed and fulfills the request. Sometimes it is possible that data may get lost in the intermediate network; the use of TCP provides the support which not only enables detection of errors or lost data but also triggers retransmission till the time that the data is received correctly and completely. (Gilbert, 1995).
A computer usually interacts with FTP by means of a PI (Protocol Interpreter), which is, in fact, a component of the FTP client, which corresponds with the PI on the remote server. At the time an FTP request or FTP response is sent, these protocol requests or responses are interpreted by the interpreter; only the interpreter can figure to send in the client’s appropriate request to the server. This means that when data is being moved around, a Protocol Interpreter works with local DTP (Data Transfer Process), whereas a remote PI works with remote DTP; together, these DTPs act together in order to carry out the transfer of a file. (FTP, n.d.)
A data connection is used to establish and maintain a data transfer process (DTP), which may be passive or active. DTP at the client and the server have to coordinate and communicate the transfer of data, and not only that; they need to figure out which part comes after which. In FTP, two-byte sizes are important, one is the file’s transfer byte size, and the other is logical byte size. A transfer byte size is utilized in defining size for data transmission, whereas a logical byte size implies the size of the file that FTP defines for a file, and transfer byte size always remains 8 bits. The control of information is managed through FTP commands which are a commands collection that contains control information that flows between user-FTP and server-FTP processes. (Postel & Reynolds, 1985).
A pathname is usually an FTP service command’s argument, which is defined as the character string that has to be entered by a user into a file system for identification of a file; they usually include names of device, directory, and files. The control connection is a path for communication that is formed between the SERVER-PI and USER-PI with the purpose of exchanging replies and commands. A file transfer function thus comprises a DTP and PI. At both sides of the protocol, i.e., the user and server have different roles executed in server-PI and user-PI. A control connection is initiated by the user-protocol interpreter, which transmits to the server process, which refers to one or more than one process that carries out the task of transferring files in collaboration with a user-FTP process but sometimes with a different server as well. The data connection has specific parameters and the nature of file system operation in the FTP commands. Data connection and data transfer are initiated by the server in line with the specified parameters; the user-DTP itself or its designate listens to a specified data port for a response. It is a requirement of file transfer protocol that when data transfer is underway, the control connections remain open; the user is responsible for requesting the closure of the control connections; the server is then responsible for initiating the action once the FTP service is over. The data connection is the only means through which files are transferred, and the control connection is a means of transferring commands, as mentioned before. The actual transferring of data is done through a storage device that lies with the host, which is sending to a storage device placed in the host, which is receiving. Sometimes, certain transformations are vital to be carried out in terms of data since the two involved systems have different data storage representations. In this case, internal representations and standard representation differences create the need for transformation, and such is performed by the sending and receiving sites. (Postel & Reynolds, 1985)
The technicalities involved in data transfer lies in establishing the data connection with the correct ports and selecting transfer parameters. A default data port exists for the server and user DTPs, and the passive data transfer process “listens” on the data port before it sends a command for the transfer request. The data transfer direction is determined by the FTP request command. When a server receives the transfer request, it initiates the data connection to the port. After establishing a data connection, the data transfer starts between DTP’s, replies of confirmation are transferred from the server-PI to user-PI. It is important to complete all the data transfers with the command of EOF (End Of File) that is either explicit or obscured through the closing of the data connection. The TCP connection is used as a channel to communicate between user-PI and server-PI, from the user to the standard server port. The user-PI not only sends FTP commands but also interprets the received replies; the server-PI, on the other hand, sends replies, constrains commands, and gives direction to its DTP for establishing data connection and data transfer. Replies to File Transfer Protocol commands are developed in a manner that ensures synchronized requests and actions in the process of file transfer and guarantees that the state of the Server is always known by the user process. (Postel & Reynolds, 1985)
There are provisions and various mechanisms present in the specification for the File Transfer Protocol (FTP), which can be brought into use to compromise network security.
One of the security problems lies in a proxy FTP, where the FTP specification permits files to be transferred to a third machine by a client instructing a server to do so. Another security lapse lies where the FTP specification places no restrictions and allows an unlimited number of attempts with respect to entering a user’s password; as a result of this fault in the specification, there is a provision for brute force “password guessing” attack. The File Transfer Protocol specification, which allows for “proxy FTP,” gives way to a mechanism of decreasing the network traffic; since the client instructs one server to transfer a file to another one, instead of transferring it from the first server to the client and then from there the second server, this is very beneficial in places where a client connects to the network through a slow link like a modem. Even though proxy FTP is useful, it exposes security issues. (Allman & Ostermann, 1999).
Hypertext Transfer Protocol
HTTP stands for the Hypertext Transfer Protocol. It is a protocol of application-level which is brought in to use for transferring data over the internet. Servers and web browsers exchange information over the internet on the basis of rules which comprise a request-response protocol, the HTTP. For instance, when a request is registered with a server by a browser, it is done by initiating a connection of TCP/IP. A client request consists of a request line, an entity, and request headers. A response that is sent by the server contains an entity, response headers, and a status line. The entity which is a part of either the request or response may be considered as simply a payload, maybe a binary data; the other items like a status line or request line are readable ASCII characters. At the end of the response, the TCP/IP connection may either be terminated by the browser or the server, or if needed, the browser can send in another request. (Kristol, n.d.)
HTTP has evolved through its different versions; the first one was called HTTP/0.9; it simply implied a protocol for the transfer of raw data over the Internet. Then came HTTP/1.0, which was an improved version in the sense that this protocol allowed MIME-like format in messages; due to this, meta-information related to modifiers on the request/response semantics and transferred data was available. Later on, there were various incompletely implemented applications that started proliferating by the name of “HTTP/1.0”, this demanded a change for protocol version so that one communicating application could determine the other’s true capabilities; to cater to this specification, protocol “HTTP/1.1” was defined. To ensure and adhere to the reliable implementation of the features of HTTP/1.1, more rigorous demands than the former version was put into place. HTTP may be utilized as a generic protocol for communicating between proxies or gateways and user agents to even other Internet systems which are maintained by the SMTP, FTP, or Gopher, among other protocols. This means that HTTP permits primary hypermedia access to resources on hand through various applications. In HTTP, a request is sent by the client to the server, and the response of the server is received in reply. The MIME messages, which are a part of the request and response format, stand for Multipurpose Internet Mail Extensions. There are times when a PDF or a media file, or a text file is opened in the browser, so the server responds with the MIME type identifying what type of file it is; the client uses MIME to state what type of media it can handle. (Fielding et al., 1999).
Most of the communication in HTTP is started by a user based on a request that is meant for and is applicable to a resource on an origin server. This can be done when the origin server and user agent establish a single connection between themselves. However, there are more complicated situations that call for a need to involve intermediaries in the chain of request/response. The common intermediaries are tunnel, gateway, and proxy. A proxy can be explained as a forwarding agent which receives requests for a URI in its complete form, rewrites the message, and forwards it in reformatted form towards the server which has been identified by the URI. A gateway, on the other hand, is a receiving agent which acts over another server(s) as a layer and translates the requests to the protocol of the primary server. A tunnel performs the part of a relay point between two connections but doesn’t change the messages; these tunnels are used mostly when there is a need for an intermediary through which the communication has to go through despite the fact that the intermediary is not able to comprehend the message contents. These are three main intermediaries, but at times a cache is also used, which aims at reducing the request/response chain when a cached response is applicable to a request through one of the participants in it. But the caches apply to only a few responses in a productive way as special requirements on the behavior of the cache are sometimes placed by requests containing modifiers. (Fielding et al., 1999).
The communication of HTTP occurs on connections of TCP/IP on a default port of TCP 80, yet use may be made of other ports when deemed necessary. HTTP assumes a reliable transport, and any other protocol which can cater to such guarantee can be used for communication; HTTP identifies resources through URIs (Universal Resource Identifiers), which have names like Universal Document Identifiers (UDI), WWW addresses, Uniform Resource Names (URN) and Locators (URL). The URLs are formatted strings that find the source for communication through characteristics like name or location. Most of the HTTP responses contain an entity that has information meant to be interpreted by users since it is preferred that they be provided with the most appropriate entity which corresponds to a request. In the case of caches and servers, there are conflicts with respect to this issue since all users do not have similar preferences when defining what is most appropriate; in a similar manner, all users do not have the same capability for interpreting all entities types. Therefore, to provide a solution to such conflicts, there are provisions in HTTP for different mechanisms which define a process through which to choose the most appropriate representation with respect to a given response in case of availability of multiple representations. (Fielding et al., 1999)
Despite the reliability of the HTTP in communicating on the internet, there are still security issues involved with this protocol. Clients are usually asked online for a great amount of personal information, but caution should be taken to assure that accidental leakage of information of a personal nature to other sources through HTTP does not take place. For this, it is important those users are provided with a convenient interface through which they may control the dissemination of such information. Another security threat comes from the server which is capable of saving personal data related to a user’s requests which can, in fact, easily identify the client’s confidential information and therefore, the law should constrain its handling by assuring that people who use the HTTP protocol to obtain confidential data of others’ for some legal reason do not distribute it without taking the permission of those individuals who will be identifiable if the results are published. HTTP is a generic data transfer protocol. Consequently, it is not capable of regulating the data contents which are transferred, nor is there any sort of pre-determined method that can decide on how certain sensitive information is keeping in view the context of a given request. A solution to this again lies in supplying the information provider, as much control as possible over such information.
More security threats are posed by software where revealing its specific version of the server gives rise to the possibility that the server will grow susceptible to such attacks, which are directed at software with identified holes in the security. Proxies are also important with respect to security, especially when they are serving as portals through a network firewall so as to assure that the identity of the host behind the firewall does not get revealed. Besides, proxies also have access to confidential information about individual users and organizations, as well as to proprietary information that belongs to users and content providers. A proxy that is compromised upon, executed, or organized without consideration given to issues of privacy and security, may be used in various potential attacks. Caution should also be exercised, and documents returned after HTTP requests should be restricted by servers to the ones which administrators of the server intend; for instance, if a server is translating a URI of HTTP straight into the file system, it should be cautious that files which are served are the correct ones meant for delivery to HTTP clients, in a similar manner, files which are meant only for the internal reference of the server should also be guarded with the purpose of preventing any inappropriate retrieval, as there are chances that it might contain sensitive information.
There is also a threat to HTTP clients as they depend greatly on the DNS (Domain Name Service), which exposes them to security attacks associated with intentional misuse of IP addresses and DNS names. Besides, the present user agents and HTTP clients are designed to preserve authentication information for an indefinite period as it does not provide a method through which servers can direct clients to dispose of these cached credentials. Due to these many security issues, there is a need for further extensions to HTTP, which will allow for a more secure transfer of data. (Fielding et al., 1999).
Allman, M. & Ostermann, S. 1999. “FTP Security Considerations”. 2008. Web.
Fielding, R., Gettys, J., Mogul, J.C., Frystyk, H., Masinter, L., Leach, P. & Lee, T.B. “Hypertext Transfer Protocol – HTTP/1.1”. 2008. Web.
FTP, n.d. “What exactly is FTP and how does it work?” 2008. Web.
FTPplanet, n.d. “Beginner’s guide to FTP”. 2008. Web.
Gilbert, H. 1995. “Introduction to TCP/IP”. 2008. Web.
Kristol, D. M. n.d. “HTTP”. 2008. Web.
Postel, J. & Reynolds, J. “File Transfer Protocol (FTP)”. 2008. Web.