CFP last date
28 March 2025
Reseach Article

Generation of Web Pages from Document Image

Published on June 2014 by Aparna Halbe, Abhijit R. Joshi
International Conference and workshop on Advanced Computing 2014
Foundation of Computer Science USA
ICWAC2014 - Number 2
June 2014
Authors: Aparna Halbe, Abhijit R. Joshi

Aparna Halbe, Abhijit R. Joshi . Generation of Web Pages from Document Image. International Conference and workshop on Advanced Computing 2014. ICWAC2014, 2 (June 2014), 0-0.

author = { Aparna Halbe, Abhijit R. Joshi },
title = { Generation of Web Pages from Document Image },
journal = { International Conference and workshop on Advanced Computing 2014 },
issue_date = { June 2014 },
volume = { ICWAC2014 },
number = { 2 },
month = { June },
year = { 2014 },
issn = 2249-0868,
pages = { 0-0 },
numpages = 1,
url = { /proceedings/icwac2014/number2/649-1434/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Proceeding Article
%1 International Conference and workshop on Advanced Computing 2014
%A Aparna Halbe
%A Abhijit R. Joshi
%T Generation of Web Pages from Document Image
%J International Conference and workshop on Advanced Computing 2014
%@ 2249-0868
%V ICWAC2014
%N 2
%P 0-0
%D 2014
%I International Journal of Applied Information Systems

The development of any project in software industry begins with Requirement specification followed by User Interface [UI] design. Normally UI design is drawn on paper first. Web designers then design the web pages as per the design on the paper. Various Mark Up languages such as HTML/XML are used to design and publish web pages on the internet. In this paper a novel approach is proposed that will do the job of web designer. This system will convert the UI design drawn on paper to HTML page. A scanned image of UI design will be provided as an input to the system and it generates the output which will be a HTML page of that UI. To do this, system requires the conversion of paper document image into hyper documents. Currently, the work done in this area is restricted only to the conversion of images and text into hyper document. Here, an idea of converting document image of UI design into actual HTML page, is proposed. Also work done so far in this area is restricted only to the text and images on documents. It does not consider various HTML controls like textbox, radio button, checkboxes, button etc. Therefore, existing system just converts the paper document into hypertext document and does not identify each HTML control as a separate component, which is ( a primary requirement ) required while designing UI. Given a UI design with different HTML controls, the existing system would just convert it to HTML page without providing any functionality. The generated HTML page will have an image of the UI design rather than actual HTML controls. The proposed work is addressing all these issues and will be considering most of the HTML controls those are required for designing static pages.

  1. Hassan, T. , Baumgartner, R. "Table Recognition and Understanding from PDF Files" International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil(2007)1143-1147
  2. Jiang, D. , Yang, X "Converting PDF to HTML Approach Based on Text Detection" 2ndInternational Conference on Interaction Sciences: Information Technology, Culture and Human. ACM New York, NY, USA, Seoul, Korea (2009)
  3. Ji-Yeon Lee, Jeong-Seon Park, HyeranByun, JongsubMoon, Seong-Whan Lee, Pattern Recognition Society. Elsevier Science Ltd, December 2001
  4. Klink, S. , Dengel, A. , Kieninger, T. "Document Structure Analysis Based on Layout and Textual Features" International Workshop on DocumentAnalysis Systems, Rio de Janeiro, Brasil (2000)41-52.
  5. Leo G Vailati, "block diagram detection", EECS 741 - Computer Vision, EECS - Dept. of Electrical Eng. & Computer Science KU - The University of Kansas (2012)
  6. Oro, E. , Ruffolo, M. : PDF-TREX "An Approachfor Recognizing and Extracting Tables from PDF Documents" 10th International Conference on Document Analysis and Recognition 2009. IEEE ComputerSociety,Barcelon
  7. Priyadharshini N1, Vijaya MS "Document Segmentation and Region Classification Using Multilayer Perceptron", IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 2, No 1, March 2014
  8. Rosmayati Mohemad, Abdul RazakHamdan, Zulaiha Ali Othman and Noor MaizuraMohamad Noor, "Automatic Document Structure Analysis of Structured PDF Files", International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(2): 404-411, The Society of Digital Information and Wireless Communications, 2011
  9. Sneha Sharma, Dr. Roxanne Canosa, advisor "Extraction of Text Regions in Natural Images" Rochester Institute of Technology, 2007
  10. Yildiz, B. , Kaiser, K. , Miksch, S. "pdf2table: A Method to Extract Table Information from PDF Files" Indian International Conference on Artificial Intelligence, India (2005) 1773–178512
  11. Hassan, T. , Baumgartner, R. "Table Recognition and Understanding from PDF Files" International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil(2007)1143-1147
  12. Jiang, D. , Yang, X "Converting PDF to HTML Approach Based on Text Detection" 2ndInternational Conference on Interaction Sciences: Information Technology, Culture and Human. ACM New York, NY, USA, Seoul, Korea (2009)
  13. Ji-Yeon Lee, Jeong-Seon Park, HyeranByun, JongsubMoon, Seong-Whan Lee, Pattern Recognition Society. Elsevier Science Ltd, December 2001
  14. Klink, S. , Dengel, A. , Kieninger, T. "Document Structure Analysis Based on Layout and Textual Features" International Workshop on DocumentAnalysis Systems, Rio de Janeiro, Brasil (2000)41-52.
  15. Leo G Vailati, "block diagram detection", EECS 741 - Computer Vision, EECS - Dept. of Electrical Eng. & Computer Science KU - The University of Kansas (2012)
  16. Oro, E. , Ruffolo, M. : PDF-TREX "An Approachfor Recognizing and Extracting Tables from PDF Documents" 10th International Conference on Document Analysis and Recognition 2009. IEEE ComputerSociety,Barcelon
  17. Priyadharshini N1, Vijaya MS "Document Segmentation and Region Classification Using Multilayer Perceptron", IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 2, No 1, March 2014
  18. Rosmayati Mohemad, Abdul RazakHamdan, Zulaiha Ali Othman and Noor MaizuraMohamad Noor, "Automatic Document Structure Analysis of Structured PDF Files", International Journal on New Computer Architectures and Their Applications (IJNCAA) 1(2): 404-411, The Society of Digital Information and Wireless Communications, 2011
  19. Sneha Sharma, Dr. Roxanne Canosa, advisor "Extraction of Text Regions in Natural Images" Rochester Institute of Technology, 2007
  20. Yildiz, B. , Kaiser, K. , Miksch, S. "pdf2table: A Method to Extract Table Information from PDF Files" Indian International Conference on Artificial Intelligence, India (2005) 1773–178512
Index Terms

Computer Science
Information Sciences


Image Processing for GUI rapid web development GUI design processing document images automatic web page generation