IPC: Inter-Process Communication

IPC: Inter Process Comunication

1. Introduction

This project builds upon former multi-thread project, where I developed a simplified web server. In this project, the existing getfile server will be turned into a proxy server and implement a simple cache server. The project consists of two parts, focusing on inter-process communication (IPC) and shared memory usage.

Gitbub Link

2. Environment

Linux Ubuntu
C
Visual Studio Code

3. Step A: Proxy Server Implementation

3.1 Tasks

Transform the existing getfile server into a proxy server that accepts GETFILE requests and translates them into HTTP requests.

3.2 Design Choice

Use of libcurl:
libcurl is a library for transferring data with URLs. It supports a wide range of protocols and provides easy-to-use interfaces. I implemented the handle_with_curl function to manage HTTP requests and responses. The function constructs a URL from a base and path, performs the request, and handles the response based on HTTP status codes.
Dynamic Memory Allocation:
Initially, the buffer was statically allocated, which limited the handling of larger data. I switched to dynamic memory allocation to efficiently manage memory usage and accommodate varying sizes of web content. Used realloc to adjust the buffer size dynamically as new data chunks arrive, ensuring data integrity and avoiding overflows.
Error Handling:
Proper error handling ensures robustness and reliability. By checking HTTP status codes, I differentiate between successful requests and different error states, such as file not found (404). I used curl_easy_getinfo to retrieve HTTP status codes and tailored responses to the client based on these codes.

				
					#include <curl/curl.h>
ssize_t handle_with_curl(gfcontext_t *ctx, const char *path, void* arg){
	CURL *curl_handle;
	CURLcode res;
	struct MemoryStruct chunk;
	char url[MAX_REQUEST_N]; // for base URL

	// Initialize the MemoryStruct
    chunk.memory = malloc(1);  // Initial buffer allocation
    chunk.size = 0;    

	// Initialize curl
	curl_global_init(CURL_GLOBAL_ALL);
	curl_handle = curl_easy_init();
	if(!curl_handle) {
		return SERVER_FAILURE;
	}

	// Construct the full URL wiht Path
	snprintf(url, sizeof(url), "%s%s", (char *)arg, path);

	// Set curl options
	curl_easy_setopt(curl_handle, CURLOPT_URL, url);
	curl_easy_setopt(curl_handle, CURLOPT_WRITEFUNCTION, WriteMemoryCallback);
	curl_easy_setopt(curl_handle, CURLOPT_WRITEDATA, (void *)&chunk);

	// Perform the request
	res = curl_easy_perform(curl_handle);

	if(res == CURLE_OK) {
		long http_code = 0;
        curl_easy_getinfo(curl_handle, CURLINFO_RESPONSE_CODE, &http_code);
        if (http_code == 200) {
            // Send header to client 
            gfs_sendheader(ctx, GF_OK, chunk.size);
            // Send the content to the client
            gfs_send(ctx, chunk.memory, chunk.size);
			printf("files sent successfully");
			curl_easy_cleanup(curl_handle);
  			free(chunk.memory);
  			return chunk.size;
        } else if (http_code == 404) {
			printf("file not found");
            gfs_sendheader(ctx, GF_FILE_NOT_FOUND, 0);
        } else {
			printf("server error");
            gfs_sendheader(ctx, GF_ERROR, 0);
        }	
	} 
	else {
		fprintf(stderr, "curl_easy_perform() failed: %s\n", curl_easy_strerror(res));
		gfs_sendheader(ctx, GF_FILE_NOT_FOUND, 0);
  }
// Cleanup
  curl_easy_cleanup(curl_handle);
  free(chunk.memory);
  return SERVER_FAILURE;

}

3.3 Flow control

The flow of control in the proxy server is as follows:

Initialization:
- The server initializes by setting up libcurl and preparing to handle incoming connections.
Request Handling:
- For each incoming request, construct the full URL using the base URL and requested path.
- Use libcurl to perform an HTTP GET request.
Response Handling:
- Check the result of curl_easy_perform.
- If the request is successful, retrieve the HTTP status code.
- Send appropriate headers:
  - HTTP 200: Send the data with GF_OK.
  - HTTP 404: Send GF_FILE_NOT_FOUND.
  - Other errors: Send GF_ERROR.
- Send data to client.
- Check if all data sent and continue to next turn.
Cleanup:
- Free allocated memory and cleanup libcurl resources.

3.4 Test and Debug

Share memory Structure:
- This program defines two structures: shared_memory_t, which is utilized to store the status of shared memory, including the number of segments, segment sizes, the total size, and a pointer to another structure;
- shared_memory_segment_t, which is designated to hold the status related to segments, encompassing the total file size, effective data size, state, data storage fields, and three semaphores for control. The advantage of this approach lies in the ease of accessing the information of each segment and the overall properties of the shared memory by calculating the pointer positions.

Functional Testing:
- Tested using provided tools gfclient_download and gfclient_measure to ensure correct file retrieval and performance measurement.
- Simulated requests for both existing and non-existing files to verify correct HTTP status handling.
Edge Case Testing:
- Tested with large files to ensure dynamic memory allocation worked correctly.
- Verified proper error handling by inducing network errors and incorrect URLs.

4. Step B: Cache Server Implementation

4.1 Tasks

Implement a cache process that communicates with the proxy via shared memory. The shared memory functions are defined in shm_channel.[ch]

4.2 Design Choice

1. Share memory Structure:

This program defines two structures: shared_memory_t, which is utilized to store the status of shared memory, including the number of segments, segment sizes, the total size, and a pointer to another structure;
shared_memory_segment_t, which is designated to hold the status related to segments, encompassing the total file size, effective data size, state, data storage fields, and three semaphores for control. The advantage of this approach lies in the ease of accessing the information of each segment and the overall properties of the shared memory by calculating the pointer positions.

				
					typedef struct {
    size_t data_size; // data size
    size_t total_file_size; // total file size
    int status;
    sem_t full_semaphore;
    sem_t empty_semaphore;
    sem_t start_semaphore;
    char data[]; 
} shared_memory_segment_t;

typedef struct {
    int segment_count; // segment_count from webproxys
    size_t segment_size; // size of one segment
    size_t total_size; // total_size of the memory space
    shared_memory_segment_t *slots;  //first segments pointer
} shared_memory_t;

struct segment_msg {
    long mtype;               // type
    int segment_id;           // index for shm segment
    char filepath[MAX_PATH_LENGTH]; // request file path
};

2. Shared Memory and Message Queues:

To enable communication between different parts of the program and manage large data efficiently. Shared memory is used for storing data that needs to be processed by different threads, while message queues are employed for task distribution. This have benefit of separating the file transfer channel from the command transmission channel. The server dispatches tasks to the cache process via the command channel, which subsequently returns the files through the data channel.
Shared memory is created and initialized with data information by the server through the create_shared_memory function defined in shm_channel.ch, which sets the properties of shared_memory_t using the provided int segment_count and size_t segment_size. Since the size of the structures is known, the starting position of each shared_memory_segment_t can be calculated using an offset, allowing for the configuration of internal parameters and the initialization of semaphores.
Simultaneously, the cache process continuously monitors the creation of the shared memory file. Once it detects that the shared memory has been initialized, it maps the address into its own virtual space. Subsequently, it opens the message queue to monitor incoming messages and places the received messages into the task queue.
In the server’s worker thread, the callback function handle_with_cache is invoked. This function sends task information to the cached end via the message queue. The structure of the task information is defined as follows, encompassing the task type, the index of the corresponding segment, and the requested file path.

				
					shared_memory_t* create_shared_memory(int segment_count, size_t segment_size) {
    int shm_fd = shm_open(SHM_NAME, O_CREAT | O_RDWR, 0666);
    if (shm_fd == -1) {
        perror("Failed to open shared memory");
        return NULL;
    }
    size_t total_size = sizeof(shared_memory_t) +
                        segment_count * (sizeof(shared_memory_segment_t) + segment_size);
    if (ftruncate(shm_fd, total_size) == -1) {
        perror("Failed to set shared memory size");
        close(shm_fd);
        return NULL;
    }
    shared_memory_t *shm_ptr = mmap(NULL, total_size, PROT_READ | PROT_WRITE, MAP_SHARED, shm_fd, 0);
    if (shm_ptr == MAP_FAILED) {
        perror("Failed to map shared memory");
        close(shm_fd);
        return NULL;
    }
	  ..................... // seting arguments
	  
    printf("Attempting mmap with size: %zu\n", total_size);
    close(shm_fd);
    return shm_ptr;
}

3. Boss-worker Threads and Tasks Steque

To handle multiple cache requests concurrently, improving performance and responsiveness. A pool of worker threads is created, each waiting on a shared queue for tasks. This design allows for efficient use of resources and simplifies task distribution.
On the server side, the webproxy creates multiple threads and a task queue during initialization. This queue contains the index of each shared memory segment. The worker threads sequentially retrieve segments from the queue and, upon completion, reinsert the segment index at the end of the queue. This approach ensures that each segment is utilized, thereby maximizing efficiency in a multithreaded environment.
On the cache end, the boss thread places the received messages into the task queue, while the created child threads retrieve messages from the queue, extracting the segment ID and the requested file path. This enables both the proxy and cache to handle requests and transfer files concurrently in a multithreaded manner.

4. Signal Handling:

Among the three semaphores, start_semaphore is used by the cache end to signal the proxy end that the task has been received and the file transfer has commenced. The full_semaphore and empty_semaphore are employed to manage file read and write operations. Specifically, the initial value of empty_semaphore is set to 1. full_semaphore and start_semaphoreis set to 0.
The cache end writes the file in chunks, ensuring that the chunk size is smaller than the segment size to prevent memory overflow, and subsequently sends the full_semaphore signal to indicate that reading can begin.

				
					ssize_t write_to_shared_memory(shared_memory_t *shm_ptr, int slot_index, const char *data, size_t size) {
    shared_memory_segment_t *slots = segment_pointer(shm_ptr, slot_index);
    if (slot_index < 0 || slot_index >= shm_ptr->segment_count) {
        fprintf(stderr, "Invalid slot index\n");
        return -1; 
    }
    if (size <= 0 || size > shm_ptr->segment_size) {
        fprintf(stderr, "Data size exceeds segment size\n");
        return -1; 
    }
    sem_wait(&slots->empty_semaphore); 
    memcpy(slots->data, data, size);
    slots->data_size = size;
    sem_post(&slots->full_semaphore);
    return size; 
}

Upon receiving the start signal, the proxy end first checks the status of the file within shared_memory_segment_t, including the file size. If the status is -1, it indicates that the file does not exist, prompting the sending of a header file file_not_found; if the status is 1, it sends file_ok. The proxy then waits for the full_semaphore before commencing the reading process, transferring the file in chunks to the client. Upon completion, it sends the empty_semaphore signal to indicate that the reading has concluded and that writing can resume.

				
					ssize_t read_from_shared_memory(shared_memory_t *shm_ptr, int slot_index, char *buffer, size_t buffer_size) {
    shared_memory_segment_t *slots = segment_pointer(shm_ptr, slot_index);

    if (slot_index < 0 || slot_index >= shm_ptr->segment_count) {
        fprintf(stderr, "Invalid slot index\n");
        return -1; 
    }
    sem_wait(&slots->full_semaphore); 
    if (slots->data_size > buffer_size) {
        fprintf(stderr, "Slot %d: data size: %zu > Buffer size: %zu\n", slot_index, slots->data_size, buffer_size);
        sem_post(&slots->full_semaphore);
        return -1;
    }
    memcpy(buffer, slots->data, slots->data_size);
    size_t data_size = slots->data_size; 
    // printf("Slot %d: read file: %zu\n", slot_index, data_size);
    sem_post(&slots->empty_semaphore); 
    return data_size; 
}

4.3 Flow of control

Proxy Side
1. Initialization:
  - Initialize shared resources and put segment ids into task queue
  - Create worker threads and set them to listen on a task queue.
2. Main Loop:
  - Continuously receive tasks from the clients and put them into the tasks queue.
3. Task Processing (Worker Threads):
  - Retrieve an empty segment slot from task queue.
  - Send msg of file path requested and segment id via message queue to cache process.
  - Wait for the start_semaphore from cache.
  - Get the status and total file size and send header to the client.
  - Start recieve file content and send file to client in a loop.
  - Check if received data == file size, and exit the loop.
  - Return sended file size value.
4. Cleanup:
  - Once the exit flag is set, Release all allocated resources like shared memory, message queues, mutex, etc.
Cache Side
1. Initialization:
  - Map with the shared resources, like shared memory, message queues.
  - Create worker threads and task queue.
  - Listen on the message queue.
2. Main Loop:
  - Continuously receive tasks from the message queue.
  - Put the msg into the task queue.
3. Task Processing (Worker Threads):
  - Wait for tasks to be available in the queue.
  - Upon receiving a task, read the file path and segment id.
  - Set the status and total size of the file in shared memory based on the result from simplecache.ch
  - Send start_semaphore to proxy.
  - Loop to write data into share memory segment.
  - Check if all data is sent and continue to next turn.
4. Cleanup:
  - Once the exit flag is set, release all allocated resources like shared memory and message queues.

4.4 Test and debug

Functional Testing:
- Tested using provided tools gfclient_download and gfclient_measure to ensure correct file retrieval and performance measurement. Specifically, after compiling, three terminals were opened to simulate the client, server, and cache. When the transmission failed, numerous print statements were added to both the server and cache ends to pinpoint the location of the failure. Once identified, gdb was used for debugging, setting breakpoints at the relevant locations to check the values of varibles.
- The most fundamental test involved sending one request of one file from the client in a single-threaded manner. If successful, the next phase involves altering parameters such as the number of transmission requests, the number of threads, the quantity and size of segments, and the number of threads on the cache end, in order to observe the transmission performance and the robustness of the program.
Edge Case Testing:
- Tested with large files to ensure transfer performance.
- Tested with sending of a large volume of requests.
- Tested with the handling of incorrect file paths or files that could not be found.
- Tested with multiple transmission of small-sized files.
- Tested with multithreaded file request submissions.
- Tested with varying quantities and sizes of segments.
- Tested with the integration of requests from multiple clients.
- Additionally, I conducted some tests using the CS6200 tools provided by students.

5. Reference

During development, the following resources were consulted: